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Abstract 

During  the  past  year,  we  worked  on  two  engineering  application 
problems:  data  smoot hing/denoising  by  wavelets  and  reverse  engineer¬ 
ing  for  aircraft  design,  as  well  as  other  optimization  problems  (such 
as  robust  regressions,  Huber  M-estimators,  merit  functions  for  con¬ 
strained  minimization  problems,  stability  analysis  of  feasible  systems, 
and  constrained  best  approximations  in  Euclidean  spaces).  While  the 
engineering  applications  are  obviously  beneficial  to  the  advancement 
of  science  and  technology  for  Air  Force,  the  basic  research  is  also  fun¬ 
damental  for  potential  applications  relevant  to  Air  Force.  In  this  final 
report,  we  will  give  an  overview  of  what  we  have  accomplished  during 
the  last  one  year  when  our  research  was  supported  by  Air  Force  Office 
of  Scientific  Research. 

The  report  has  five  parts:  (i)  automatic  threshold  selection  for 
wavelet  denoising;  (ii)  reverse  engineering  for  aircraft  design;  (iii)  ro¬ 
bust  regression  and  Huber  M-estimator;  (iv)  global  error  bounds  for 
quadratic  inequalities;  (v)  merit  functions  for  complementarity  prob¬ 
lems. 
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1  Automatic  Threshold  Selection  for  Wavelet  De¬ 
noting 

Recently,  one  major  trend  in  wavelet  research  is  an  optimal  representation 
of  a  signal  by  a  library  (or  dictionary)  of  basis  functions,  led  by  Mallat, 
Coifman,  and  Donoho  [1,  2,  3,  4,  8].  The  mathematical  objective  is  to  find  a 
representation  of  a  signal  by  a  linear  combination  of  as  few  functions  in  the 
library  as  possible  within  certain  error  tolerance.  The  motivation  for  such 
an  optimal  representation  of  a  signal  is  to  achieve  data  compression,  since 
one  only  has  to  store  the  coefficients  an  approximate  representation  of  the 
actual  signal. 

However,  such  an  optimal  representation  is  useful  not  only  for  data  com¬ 
pression,  but  also  for  denoising  of  signals  and  feature  extraction  of  speech 
signals,  as  pointed  out  by  Buckheit  and  Donoho  [1].  Our  preliminary  re¬ 
search  [5]  as  well  as  Buckheit  and  Donoho’s  indicates  that  wavelet  packet 
representation  may  provide  better  feature  extraction  than  other  classical 
classification  method,  such  as  linear  discriminant  analysis.  Here  we  first 
give  a  brief  review  on  denoising  and  feature  extraction  by  wavelets.  Then 
we  report  our  progress  made  on  Donoho  and  Johnstone’s  wavelet  denoising 
method  for  recovering  antenna  radiation  pattern  from  noisy  measurements. 

1.1  Review  on  Denoising  and  Feature  Extraction 

The  wavelet  packet  transform  is  generated  by  a  pair  of  quadratic  mirror 
filters  which  decompose  the  signal  into  a  series  of  subbands  (“frequency 
slots”)  by  repeated  convolution  and  decimation.  The  wavelet  packet  coeffi¬ 
cients  generated  by  such  a  transform  represent  the  energy  levels  of  the  signal 
at  different  time  locations  and  various  frequency  bands.  With  appropriate 
choice  of  subbands  and  and  filters,  it  is  possible  to  “capture”  the  signal  with 
very  few  significantly  large  coefficients. 

Let  {fif(rc)}n= i  (high-frequency  filter)  and  {h(n)}£=1  ( g(n )  =  (-l)nh(l- 
n))  (low-frequency  filter)  be  a  pair  of  finite  impulse  response  (FIR)  quadra¬ 
ture  filters  (QF)  that  correspond  the  coefficients  in  the  equations  that  define 
scaling  and  mother  wavelet  functions: 

<p{t)  =  n), 

n 

iKO  =  X^(nM2<_n)’ 


2 


where  <p  is  the  scaling  function  and  ip  is  the  mother  wavelet. 

Given  a  sequence  {x(j)}f=i  of  N  samples  of  a  signal,  one  can  separate 
the  frequency  domain  of  the  signal  into  two  subbands  of  the  same  width  by 
using  the  following  transformation: 

yi(k)  =  'Znh(n)x(2k-n), 

yo(k)  =  '£ng(n)x{2k-n),  '* 

where  yo  and  y\  contain  the  information  of  the  signal  in  the  high  and  low 
frequency  subbands,  respectively.  The  coefficients  yo(&)’s  and  yi(fc)’s  are 
called  wavelet  packet  coefficients  at  the  first  level.  Repeating  the  same 
operation  on  a  subband,  we  can  get  better  frequency  separation  of  the  signal 
at  the  cost  of  lower  resolution  in  time  domain  for  the  signal  representation. 
This  phenomena  is  dictated  by  the  uncertainty  principle  in  signal  processing. 
For  speech  recognition  problems,  the  design  of  subbands  for  the  wavelet 
packet  transform  of  signals  is  a  very  important  issue,  since  a  signal  tends  to 
have  energy  concentrated  on  certain  frequency  bands.  If  these  subbands  are 
used  in  the  wavelet  packet  transform,  then  one  might  be  able  to  use  a  few 
coefficients  in  these  subbands  to  represent  the  signal.  As  a  result,  one  can 
achieve  data  compression  of  the  signal  (a  compressed  representation  of  the 
signal  by  a  few  coefficients).  This  compressed  representation  can  be  used 
for  either  denoising  or  feature  extraction.  If  one  uses  these  coefficients  to 
reconstruct  a  signal  by  the  inverse  wavelet  packet  transform  (the  transform 
(1)  is  invertible),  then  it  will  be  a  smoothed  version  of  the  original  signal. 
This  is  the  idea  behind  wavelet  shrinkage  for  denoising.  However,  if  one 
uses  these  coefficients  as  the  input  of  a  classification  program,  such  as  a 
neural  network  or  a  multisurface  method  of  pattern  separation  [20],  then 
these  coefficients  are  called  features  of  the  signal  in  speech  recognition  [1, 
5].  Therefore,  the  compressed  representation  of  the  signal  provides  a  new 
approach  for  feature  extraction  in  speech  recognition. 

For  a  special  wavelet  packet  transform,  we  get  a  set  of  wavelet  packet 
coefficients  from  the  input  signal 

{Wiij:  i  =  1, j  =  l,---,2r’}, 

where  Wij  represents  the  energy  concentration  at  the  i-th  frequency  subband 
and  near  the  time  location  T.  The  parameter  T  is  the  time  duration  of 
the  signal:  T  =  ^r,  where  S  is  the  sampling  rate  of  the  signal. 

For  speech  recognition  problems,  instead  of  finding  a  best  basis  for  all 
signals  [1],  we  used  the  averages  of  wavelet  packet  coefficients  Wij's  as  fea- 
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tures  [5].  For  isolated  stops,  the  six  consonants  /b,p,d,t,g,k/  can  be  auto¬ 
matically  classified  with  over  93%  accuracy  based  on  extracting  83  features 
out  of  a  75  ms  signal  interval  beginning  with  the  release  of  the  burst.  For 
stops  extracted  from  continuous  speech  samples,  we  achieve  an  accuracy  of 
71%  using  99  features  out  of  an  83  ms  signal  interval.  These  relatively  poor 
results  are  still  comparable  with  a  method  using  formant  trajectories.  How¬ 
ever,  they  are  not  yet  as  good  as  results  based  on  smoothed  time/frequency 
features  [5].  One  possible  reason  is  that  each  class  of  consonants  has  certain 
ridge  signatures  on  the  time/frequency  plane  and  the  features  based  on  ridge 
signatures  reflect  more  of  the  characteristics  of  speech  signals. 

As  for  signals  received  by  antenna  radiation  pattern  measurement  equip¬ 
ment,  we  realized  that  the  radar  pattern  nulls  were  mistaken  as  noisy  spikes 
and  were  suppressed  by  wavelet  shrinkage  method  for  denoising.  This  con¬ 
firms  Coifman  and  Donoho ’s  conclusion  that  pseudo- Gibbs  phenomena  hap¬ 
pens  in  the  neighborhood  of  discontinuities  [3].  They  attribute  this  to  the 
lack  of  translation  invariance  of  the  wavelet  basis.  Their  strategy  to  reduce 
the  Gibbs  phenomena  is  to  average  out  the  translation  dependence  by  shift¬ 
ing  the  signal,  denoising  the  shifted  signal,  and  unshifting  the  smoothed 
signal.  The  method  is  called  translation-invariant  denoising  [3].  Whether 
such  a  method  can  be  used  to  preserve  the  radar  pattern  nulls  remains  to 
be  seen. 

1.2  Progress  on  Wavelet  Denoising 

Given  iV(=  2k )  samples  of  a  signal,  from  the  statistical  theory  about  the 
optimal  minimax  rate  of  convergence  established  by  Donoho  and  Johnstone, 
we  know  that  it  is  desirable  to  choose  level-dependent  thresholds  based  on 
the  standard  deviation  of  the  noise  signal.  For  example,  Donoho  and  John¬ 
stone  proposed  to  use  the  following  thresholds  for  thresholding  of  wavelet 
coefficients: 

=  c,<7,  ■  (2) 

where  c;  are  some  constants.  The  level-dependent  thresholding  can  be  con¬ 
sidered  as  a  black  box  that  takes  a  set  of  wavelet  coefficients  {wij  :  1  <  j  < 
21“1, 1  <  i  <  k}  and  output  another  set  of  wavelet  coefficients: 

{t&jj  :  1  <  j  <  2*-1, 1  <  i  <  k }, 

where  u>,j  =  0  if  <  <r,  and  =  tn.-j  (l  —  otherwise.  Then 

one  can  construct  a  unique  signal  *1,  •  •  *,xn  corresponding  to  the  modified 
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wavelet  coefficients  Wij.  By  the  results  established  by  Donoho  and  John¬ 
stone,  we  can  consider  x  as  an  optimal  recovery  of  the  uncontaminated  signal 
in  the  minimax  sense. 

However,  in  a  practical  situation,  the  standard  deviation  of  the  noise 
signal  is  generally  unknown.  This  causes  a  serious  problem  for  engineers 
who  would  apply  the  wavelet  denoising  method  in  applications.  In  order  to 
derive  a  practical  method  for  automatic  selection  of  the  threshold,  we  need 
to  consider  the  threshold  selection  from  a  different  perspective.  Suppose 
that  we  know  the  standard  deviation  a  of  the  noise  signal  e.  Then  the 
thresholds  for  various  levels  defined  in  (2)  yield  an  optimal  recovery  x  from 
x.  Intuitively,  we  can  assume  that,  with  a  high  probability,  x  is  very  close  to 
the  uncontaminated  signal.  That  is,  x  —  x  should  be  very  close  to  the  noise 
signal  e.  As  a  result,  the  standard  deviation  a  of  x  —  x  is  approximately 
<7.  Therefore,  one  statistical  criterion  for  a  good  estimate  <7  of  the  standard 
deviation  of  the  noise  is  that  a  «  <7. 

To  implement  such  an  idea,  we  use  a  control  parameter  £(>  0)  as  the 
error  tolerance  of  the  threshold  estimate.  Then  we  try  to  find  the  largest  a 
such  that  |^z£.|  <  S. 

If  one  wants  the  optimal  choice  of  <7,  then  one  has  to  solve  the  nondif- 
ferentiable  global  minimization  problem: 

0  -  a 
min  - 

er>0  O 

Note  that  lim^oo  g(a)  =  lima_+o  <7(V)  =  1-  Therefore,  the  above  minimiza¬ 
tion  has  a  global  solution. 

We  implemented  the  automatic  threshold  selection  for  wavelet  shrinkage 
denoising.  The  application  of  this  method  to  signals  received  by  antenna 
radiation  pattern  measurement  equipment  at  Airforce  Rome  Lab  shows  that 
the  method  can  recover  the  noise-free  signal  quite  accurately  near  the  main- 
lobe  and  dominant  sidelobes  (cf.  Figures  8  and  9).  Such  a  denoising  tech¬ 
nique  not  only  provides  a  smoothed  signal,  but  also  gives  an  empirical  esti¬ 
mate  of  the  standard  deviation  of  the  noisy  signal. 

A  potential  application  of  such  a  denoising  technique  is  to  filter  out  the 
noise  in  antenna  radiation  pattern  measurement  so  that  the  true  antenna 
gain  pattern  versus  orientation  angle  can  be  discerned.  Antennas  that  oper¬ 
ate  at  40  GHz  and  above  are  being  developed  that  need  to  be  tested  against 
their  design  characteristics.  This  means  that  power  amplifiers  capable  of 
operating  at  these  frequencies  are  also  required  for  far-field  antenna  pattern 


=:  Sf(ff). 
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measurement.  These  state-of-the-art  amplifiers  are  either  costly  or  nonex¬ 
istent.  Thus,  getting  the  most  signal-to- noise  ratio  is  essential  for  reducing 
measurement  costs. 


1.3  Conclusions 

As  we  pointed  out  before,  the  radar  pattern  nulls  can  be  easily  mistaken 
as  noise  by  a  nonparametric  data  smoothing/denoising  method.  For  ex¬ 
ample,  Donoho  and  Johnstone’s  wavelet  shrinkage  denoising  method  might 
destroy  the  null  structure  near  the  sidelobes  (compare  the  smoothed  sig¬ 
nal  and  noiseless  signal  between  —40 — 30  in  Figure  9).  One  approach  for 
maintaining  the  null  structure  is  to  set  lower  thresholds  near  the  nulls  for 
wavelet  coefficients  corresponding  to  high  frequency  subbands  to  preserve 
the  null  structure.  We  will  continue  to  explore  various  denoising  approaches 
for  processing  of  signals  received  by  antenna  radiation  pattern  measurement 
equipment  and  our  objective  is  to  make  a  technology  transition  of  the  state- 
of-art  wavelet  denoising  techniques  to  Airforce. 
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2  Reverse  Engineering  for  Aircraft  Design 

Reverse  engineering  is  a  relatively  new  term  that  emerged  in  the  late  1980’s. 
In  general,  reverse  engineering  is  a  process  that  begins  with  a  physical  form 
and  ends  with  a  computer  representation  of  the  form.  With  the  recent 
developments  of  advanced  laser  digitizers  and  other  computer-aided  tools, 
reverse  engineering  has  evolved  from  a  research  initiative  to  an  industrial 
reality. 

Reverse  engineering  has  multiple  applications,  including  the  following: 

•  manufacturing  a  mechanical  part  from  a  new  physical  model  or  pro¬ 
totype, 

•  replicating  an  existing  mechanical  part  that  does  not  have  a  computer- 
recognizable  design, 

•  analyzing  or  modifying  a  mechanical  part  that  does  not  have  a  computer- 
recognizable  design, 

•  verifying  a  mechanical  part  that  has  a  computer-recognizable  design 
[3]. 

Reverse  engineering  consists  of  four  steps:  data  acquisition,  data  sepa¬ 
ration,  curve  or  surface  fitting,  computer-aided  manufacturing.  Data  acqui¬ 
sition  is  the  collection  of  data  points  from  the  surface  of  a  object.  Normally, 
this  is  done  by  using  a  coordinate  measuring  machine  (CMM)  or  a  laser 
scanner.  Because  most  objects  are  complicated  (i.e.,  they  are  the  compos¬ 
ite  of  multiple  geometric  shapes),  it  is  difficult  to  represent  the  object  with 
one  matematical  surface.  Therefore,  it  is  natural  to  separate  the  collected 
data  into  several  components  with  simple  geometric  shapes.  Then,  one 
can  fit  a  curve  or  surface  equation  to  the  data  of  each  component.  Together 
these  equations  provide  a  mathematical  surface  representation  of  the  object. 
This  representation  or  mathematical  surface  model  can  be  programmed  into 
computer-aided  manufacturing  (CAM)  tools,  which  are  used  to  manufacture 
a  facimile  of  the  original  object. 
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From  a  mathematical  point  of  view,  the  difficulty  of  the  reverse  engi¬ 
neering  process  is  the  construction  of  surface  equations  from  a  set  of  data 
points.  There  are  software  packages  that  were  developed  recently  for  the 
reverse  engineering  process,  rather  than  for  engineering  design.  Examples 
of  these  packages  are  “SURFACER”  by  Imageware,  Inc.,  and  “STRIM” 
by  Matra  Datavision.  However,  each  application  imposes  different  crite¬ 
ria  for  goodness-of-fit.  For  example,  the  objective  of  reverse  engineering  a 
car  model  is  complete  different  from  that  of  reverse  engineering  an  aircraft 
model.  The  former  has  more  emphasis  on  aesthetics  and  the  latter  has  more 
emphasis  on  geometric  shape  and  accurary. 

The  main  purpose  of  reverse  engineering  an  aircraft  surface  is  to  use 
the  original  model  as  a  starting  point  to  design  experimental  aircraft.  His¬ 
torically,  aircraft  design  has  been  divided  into  three  phases:  (1)  conceptual 
design;  (2)  preliminary  design;  and  (3)  detailed  design.  A  new  methodology 
is  to  use  a  multi-disciplinary  optimization  (MDO)  approach  that  combines 
surface  geometry  design,  surface  or  volume  grid  generation,  and  aerody¬ 
namic  optimization  based  on  CFD  analysis.  With  the  MDO  approach,  the 
designer  takes  an  existing  surface^  design,  generates  the  appropriate  grids, 
and  performs  a  CFD  analysis  of  the  design.  The  designer  uses  the  analysis 
to  identify  aerodynamic  features  that  are  not  optimal.  The  designer  then 
modifies  the  design  parameters  associated  with  the  aerodynamic  features. 
This  generates  a  new  surface  design.  The  designer  repeats  this  process  until 
he  achieves  the  model  with  the  desired  aerodynamic  features. 

The  MDO  approach  does  have  some  serious  difficulties.  First,  if  the 
surface  model  has  too  many  design  parameters,  the  computational  cost  to 
optimize  with  respect  to  design  parameters  may  be  prohibitive.  Second,  de¬ 
sign  parameters  within  feasible  ranges  produce  smooth  surfaces  (i.e.,  surfaces 
without  ripples).  When  spline  control  points  are  used  as  design  parameters, 
it  is  difficult  to  determine  the  relationship  between  the  control  points  and 
the  surface  ripples. 

To  avoid  these  difficulties,  a  practical  approach  is  to  use  a  small  set  of 
engineering  parameters  to  generate  an  aircraft  model.  This  is  the  basic  idea 
behind  the  Rapid  Aircraft  Parameter  Input  Design  (RAPID)  model,  devel¬ 
oped  by  Bloor,  Wilson,  and  Smith  [10].  With  respect  to  the  RAPID  model, 
the  reverse  engineering  problem  is  to  determine  the  engineering  design  pa¬ 
rameters  from  a  set  of  aircraft  surface  data  points.  Then,  one  can  optimize 
the  aerodynamic  features  of  the  model  by  modifying  the  appropriate  param¬ 
eters. 

In  this  section,  based  on  Bloor  and  Wilson’s  PDE  surface  model,  we  use 
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a  nonlinear  least  squares  method  to  extract  a  set  of  engineering  parameters 
from  surface  grids  of  an  aircraft. 

For  more  details,  see  the  joint  paper  written  by  J.  Huband,  R.  Smith, 
and  the  principle  investigator  [4]. 

2.1  Review  on  Reverse  Engineering 

For  surface  reconstruction  in  reverse  engineering  problems,  one  has  to  derive 
a  parametric  surface  representation  of  a  set  of  data  points  {(x*,  yt,  £,)}£_}  in 
space  R3: 

x  =  x(u,u),  y  =  y(u,v),  z  =  z(u,v)^  for  0  <  u,v  <  1.  (3) 

The  standard  procedure  for  constructing  a  surface  from  S  =  {(x;,  y ;, 
is  a  three-step  process: 

•  selection  of  4  sets  #3,1?4  °f  ordered  boundary  points  from  5; 

•  fitting  of  a  spline  curve  C*  of^each  boundary  set  Bi  such  that  Ci,  C2,  C3, 
C4  form  a  closed  loop; 

•  construction  of  a  surface  based  on  4  boundary  curves 
and  the  data  set  S. 

There  are  many  papers  on  construction  of  a  surface  from  the  data  set  S  if 
Z{  =  /(xt-,  yi)  and  (xt-,  t/t)  are  in  a  rectangular  region  or  5  is  a  set  of  grid 
points  on  a  surface  [1,  2,  3,  6,  7,  8].  However,  for  digitized  surface  of  a 
mechanical  part  such  as  a  wing  of  an  aircraft,  there  is  no  structure  pattern 
for  5.  At  NASA  Langley  Research  Center,  the  surface  construction  is  done 
by  using  a  state-of-art  reverse  engineering  software  called  SURFACER  [9] 
developed  by  Imageware  Inc.  The  selection  of  boundary  points  has  to  be 
done  manually  through  a  point  clicking  interface.  Even  though  the  interface 
is  very  easy  to  use,  the  process  is  quite  tedious  and  time-consuming.  The 
fitting  process  used  by  SURFACER  is  a  variation  of  thin  spline  fitting,  which 
is  very  sensitive  to  the  results  of  boundary  curve  fitting.  For  example,  the 
knots  for  the  final  surface  are  completely  predetermined  by  the  knots  of 
the  boundary  curves.  As  a  consequence,  a  successful  surface  construction 
relies  on  the  expertise  of  a  designer/modeller  who  goes  though  many  trials 
of  boundary  and  surface  fitting  to  get  a  visually  satisfactory  result.  The 
major  difficulty  in  such  a  surface  construction  process  is  how  to  determine 
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the  corresponding  parameter  (u,  ,  v,)  of  (x,-,  2,)  for  a  given  parametric  form 

of  surfaces  (3).  There  is  no  mathematical  research  done  so  far  on  this  topic. 

For  aircraft  designs,  a  spline  surface  does  not  make  much  sense  to  en¬ 
gineers.  Therefore,  surface  models  based  on  engineering  (or  geometric)  pa¬ 
rameters  (such  as  the  wing  section  chord  length,  the  wing  section  maximum 
thickness,  the  location  of  maximum  camber,  cf.  Figures  3-7  by  Robert  Smith 
at  NASA/Langley  Research  Center)  are  very  important  for  modification  of 
a  given  design.  Smith,  Bloor,  Wilson,  and  Thomas  [10]  developed  a  RAPID 
prototyping  model  for  airplane  design  at  NASA  Langley  Research  Center. 
However,  in  order  to  make  the  software  applicable  in  practice,  one  has  to 
convert  the  surface  grids  of  a  given  airplane  to  a  RAPID  model.  Then  one 
can  modify  the  RAPID  model  to  derive  a  design  with  optimal  aerodynamic 
characteristics.  This  often  involves  a  multidisciplinary  design  optimization 
process. 

2.2  Progress  on  Surface  Reconstruction 
neering 

Our  major  contribution  is  to  reverse  the  following 
prototyping  of  an  aircraft: 

Id2  (  a  \2  d2  V 

+  W)  F{(',,)  =  0'  tOT0-’’'(-1'  (4) 

F( 0,  rj)  =  P(l,  V),  for  0  <  7/  <  1,  (5) 

F(£,0)  =  Ro(£),  F(t,l)  =  B1(t),  for  0  <  £  <  1,  (6) 

.§-(£,0)  =  im  ^(£,l)  =  *i(£),  for  0  <  £  <  1,  (7) 

where  (5)  forces  the  solution  to  be  a  periodic  function  of  £  with  period  T  =  1, 
(6)  is  the  Dirichlet  boundary  condition,  and  (7)  is  the  Neumann  boundary 
condition. 

It  is  well-known  that  the  fourth-order  elliptic  PDE  (4)  with  the  given 
boundary  conditions  has  a  unique  solution.  This  provides  a  way  of  generat¬ 
ing  a  surface  patch  with  the  given  Dirichlet  and  Neumann  boundary  condi¬ 
tions,  which  is  the  mathematical  foundation  of  Bloor  and  Wilson’s  idea  for 
rapid  prototyping  of  an  aircraft  and  other  rapid  prototyping  applications. 

For  example,  in  the  design  of  the  inboard  wing  section,  one  usually  speci¬ 
fies  the  two  airfoils  at  the  two  ends  of  the  inboard  wing  section  (which  are  the 
intersection  of  the  inboard/outboard  wing  sections  and  the  intersection  of 
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the  inboard  wing/fuselage),  as  well  as  how  smooth  the  inboard  wing  section 
blends  with  the  fuselage  and  the  outboard  wing  section.  Note  that  the  two 
airfoils  are  given  as  boundary  functions  -Bo(£)  and  l?i(£),  while  the  smooth¬ 
ness  of  blending  is  represented  by  boundary  functions  N0(£)  and  Ni(£). 
Therefore,  construction  of  the  wing  section  is  mathematically  equivalent  to 
construction  of  a  smooth  surface  that  satisfies  the  boundary  conditions  (6) 
and  (7).  For  the  inboard  wing  section,  the  boundary  functions  are  given  as 
follows: 


I  x  =  a;(£)  +  Xc\ 

*o(0=  »  =  + 

\*=  m+ZcJ 

/§f  =  f*(0\ 
MO  =  K  =  -Si  , 
Vft  =  o  ) 


I x  —  ^£x(0  xw\ 

Bi(0  =  y  =  MO  -  » 

\z  =  y(t)Ta  +  Zw  ) 

(%  =  Si  M^Ov'iO 


MO  = 


dy  =  1 
dv  V 


V|f  =  -52  sinK)f 


where  Xc,  Ro ,  Hi ,  ZCJ  Bw,  C,  Xw,  r(£),  Ta,  Zw  are  engineering  (or  geometric) 
parameters  for  the  inboard  wing~  x(£)  and  y(£)  are  functions  determined 
by  the  engineering  parameters  (such  as  wing  section  cord  length,  maximum 
camber  location,  etc),  and  5i,  £2  determine  how  smooth  the  inboard  wing 
section  blends  into  the  outboard  wing  section  and  the  fuselage,  respectively. 

One  application  of  the  RAPID  prototyping  model  is  to  modify  the  en¬ 
gineering  parameters  of  an  existing  design  to  obtain  an  optimized  design. 
For  this  purpose,  one  has  to  recover  the  engineering  parameters  from  a  sur¬ 
face  grid  of  an  existing  design  (a  REVERSE  ENGINEERING  PROCESS) 
and  then  to  go  through  a  multidisciplinary  design  optimization  process  to 
get  a  better  design.  In  this  case,  the  reverse  engineering  process  can  also 
be  thought  of  as  a  systems  identification  or  least  squares  problem  involving 
(airplane)  geometry. 

In  order  to  make  the  reverse  engineering  possible,  we  first  derived  the 
canonical  basis  for  the  Hermite  interpolation  problems  involved  in  the  pro¬ 
cess: 


=  0?  -  1)2(1  +  2 t/),  /o,2(j?)  =  J?2(3  -  27/), 

foM  =  v(v  - 1)2,  Mv)  =  v2(v  - 1), 

=  (1  ~  J/K'”'  +  ^(1  -  7/)  sinh(cra7/)  +  ^-7/3^11(077(1  -  7/)), 
fnfliji)  =  7/ean(1-,,)  +  J^(l  -  7/)  sinh(a777/)  +  ^ 7/sinli(<m(l  -  7/)), 
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,  /  v  an.,  .  .  .  .  .  sinh(an)  ...  .,  .. 

SnM  =  -  V)  sma(anrj)  +  — - - r/smh(an(l  -  r) )), 

cLn  an 

p  ,  x  sin h(anV  .  .  ,  /  .  an  .  .  , 

fn,4\V)  = - -7 - (1  -  J/)sinh(onj/)  +  —  77smh(an(l  -  j/)), 

where  n  =  1, 2,  •  •  *  and 

dn  :=  sinh2(an)  -  (an)2, 

pn  :=  an(an  —  1)  —  sinh(an)ean, 

gn  :=  anean  —  (an  -  1)  sinh(an). 


Then  we  used  the  following  reverse  engineering  model  for  surfaces  generated 

by  (4)— (7): 

F(Z,  v)  :=  (o?} /o,i(»?)  +  oj,2)/o,2(f?)  +  oj3)/o,3(»?)  +  a^foAvj) 

m  / 

+  53  (  (°«: ]fnM  +  a^fnM  +  a^fnM  +  o£4) fnAvi)  cos(2n7r£) 

+  (b^fnAn)  +  b^fnM  +  bn}fnM  +  b{4)  fnAvj)  sin(2TO7r^)J 

+  (b0(0  -  41}  -  Zj  (an  J  cos(2n7r£)  +  sin(2n7rO)  j  fm+iAv ) 

+  ~  ajj2)  -  £  (42)  cos(2»irf)  +  bW  sin(2n7r£))^  fm+iAv) 

+  \N0(0  -  Oq3)  -  Z3  (al3)  coB(2mrf)  +  6^)  sm(2nx£))  j  fm+iAv) 

+  ^i(0  -  44)  ~  £  (a«4)  +  bi4)  sin(2nir£)))  fm+iAv)- 

For  example,  the  corresponding  reverse  engineering  process  for  the  inboard 
wing  section  can  be  summarized  in  the  following  algorithm. 


Algorithm  1  For  a  given  wing  surface  grid  {Fij  :  1  <  i  <  fc,  1  <  j  <  /} 
with  i  =  1,  A:  corresponding  to  the  inboard /outborad  wing  intersection  and  the 
wing/fuselage  intersection }  respectively ,  recover  the  engineering  parameters 
for  the  inboard  wing  section  as  follows . 

Step  1.  Use  {Fij  :  1  <  j  <  1}  to  recover  2?o(£)  as  given  in  (8)  ( i.e 
recover  the  engineering  parameters  for  the  inboard/outboard  wing  in¬ 
tersection). 
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Step  2.  Use  {Fk,j  :  1  <  j  <1}  to  recover  J9i(f)  as  given  in  (8)  (i.e.,  recover 
the  engineering  parameters  for  the  wing/fuselage  intersection). 


Step  3*  Estimate  the  parameters  fi ,•••,£*  and  771,  •  •  ry  corresponding  to 
the  surface  grid  (i.e.,  determine  the  original  grid  distribution  used  to 
generate  the  surface  grid). 

Step  4.  Recover  the  Neumann  boundary  conditions  (i.e.,  S\,Si)  by  using 
the  x -components  of  the  surface  grid;  i.e.,  find  Si,  S2  by  solving 

the  following  overdetermined  system  of  linear  equations  with  unknowns 


m 

VlJSl  +  v'fSi  +  jn  +  9nflan]  +  ' ') 

n=0 

m 

+  E  + KiW  +  KiX'1)  - 


n— 1 


for  1  <  i  <  k,  1  <  j  <  l, 


where  v{\  v?,  g**,,  g%2,  g'?%  <&f4,  h'^,  ti£2,  h]fi3,  h%4,  7 are 
constants  given  by  the  following  formulas: 

V1 3  =  \x(Zi)f™+l,3(Vj), 

v'2  =  sin(7r^)y,(^)/m+1,4(77J), 

ffnfr  =  (fnAVj)  ~  /m+l,r(%))  COs(2;rn&), 

hnJ,T  =  ( fn,r(Vj )  -  /m+l,r(»?i))sm(27rn&), 

7»,i  =  xt,j  ~  (*(£»)  +  Xc)  fm+l,l{vj)  ~  fm+lAVj)- 


With  the  above  algorithm,  we  can  recover  the  engineering  parameters 
of  the  inboard  wing  section  with  error  less  than  0.1%  for  a  surface  grid 
generated  by  the  PDE  model  (cf.  Figure  1  for  original  surface  grids  of  the 
fuselage  and  wingand  and  Figure  2  for  the  surface  grids  generated  by  the 
recovered  engineering  parameters). 

2.3  Conclusions 

The  current  reverse  engineering  model  can  be  used  for  the  following  wing 
configuration: 
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•  high  wing  low  aspect  ratio  short  fuselage, 

•  double-delta  transport  with  low  wing, 

•  HSCT  type  configuration, 

•  Delta  wing  configuration. 

Another  direction  is  to  measure  an  existing  wing  tunnel  model  using 
a  laser  digitizer  and  apply  the  reverse  engineering  model  to  the  digitized 
surface  points  to  recover  the  engineering  parameters  according  to  the  RAPID 
model.  The  mathematical  problems  involved  are  the  following: 

•  automatic  identification  of  boundary  points, 

•  parametrization  of  data  points  based  on  boundary  curves. 

The  first  problem  is  to  build  a  mathematical  model  that  can  “realize  the 
human  perception  of  boundary  points  of  a  surface  grid”.  This  is  more  or 
less  a  mathematical  modeling  problem  in  artificial  intelligence.  We  shall  use 
concepts  and  ideas  in  the  projective  geometry  to  find  a  reasonable  solution. 
The  second  problem  is  related  to  accuracy  and  stability  of  a  free  form  surface 
fitting  of  a  data  cloud.  We  will  study  the  potential  of  the  inverse  of  Coons 
mapping  for  finding  a  reliable  parametrization  of  data  points.  This  is  a  very 
difficult  issue.  Note  that,  in  Algorithm  1,  we  had  to  rely  on  the  special 
structure  of  the  underlying  model  to  find  &,  rjj  in  Step  3,  which  is  the  main 
source  of  errors  in  our  reverse  engineering  process.  Major  progress  in  this 
direction  will  be  beneficial  not  only  to  Air  Force,  but  also  to  the  whole 
aerospace  industry. 
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3  Robust  Regression  and  Huber  M-Estimator 

Relationships  between  a  linear  t\  estimation  problem  and  the  Huber  M- 
estimator  problem  can  be  easily  established  by  their  dual  formulations.  The 
least  norm  solution  of  a  linear  programming  problem  studied  by  Mangasar- 
ian  and  Meyer  provides  a  key  link  between  the  dual  problems.  Based  on 
the  dual  formulations,  we  establish  a  local  linearity  property  of  the  Huber 
M-estimators  with  respect  to  the  tuning  parameter  7  and  derive  that  the  so¬ 
lution  set  of  the  Huber  M-esfimator  problem  is  Lipschitz  continuous  with  re¬ 
spect  to  perturbations  of  the  tuning  parameter  7.  As  a  consequence,  the  set 
of  the  linear  l\  estimators  is  the  limit  of  the  set  of  the  Huber  M-estimators 
as  7  — ►  0+.  Thus,  the  Huber  M-estimator  problem  has  many  solutions  for 
small  tuning  parameter  7  if  the  linear  l\  estimation  problem  has  multiple 
solutions.  A  recursive  version  of  Madsen  and  Nielsen’s  algorithm  based  on 
computation  of  the  Huber  M-estimator  is  proposed  for  finding  a  linear  l\ 
estimator.  This  part  is  a  summary  of  a  joint  paper  written  by  J.  Swetits 
and  the  principle  investigator  [16]. 
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3.1  Review  on  i\  Regression  and  Huber  M-estimator 

Consider  the  following  linear  estimation  problem  of  finding  a  vector  x*  in 
Hn  that  solves  the  following  minimization  problem: 

min  ||Ax  —  dlh,  (10) 

X 


where  A  is  m  x  n  (m  >  n)  representing  the  underlying  linear  model,  x  in  ]Rn 
is  the  parameter  vector,  d  €  !Rm  is  the  data  vector,  and  ||z||i  :=  \zi\  is 
the  ^i-norm  of  z.  In  general,  one  might  assume  that  A  has  rank  n  so  that 
different  parameter  vectors  correspond  to  different  regression  functions,  even 
though  this  assumption  is  not  required  in  our  study  of  (10). 

The  linear  l\  estimation  problem  is  much  more  difficult  to  solve  than  the 
linear  maximum  likelihood  estimation  problem,  where  the  ^-norm,  ||z||2  := 

(X^i  zi)  *  •>  is  used  instead  of  the  £i-norm.  See  [1,  2,  3,  7,  18,  25,  27,  29,  30] 
for  various  algorithms  for  solving  the  Unear  l\  estimation  problem.  One 
reason  for  solving  (10)  instead  of  the  corresponding  least  squares  problem  is 
that  Unear  l\  estimators  are  more  robust  than  Unear  least  squares  estimators 
[9, 10]  in  the  presence  of  outUers.  Note  that  (10)  is  referred  as  a  nonsmooth 
optimization  problem  because  the  objective  function  (i.e.,  the  expression  one 
tries  to  minimize)  is  not  a  differentiable  function  of  parameters  Xj,  •  •  -,xn. 
Many  robust  regression  models  allow  one  to  get  a  robust  estimator  without 
involving  nonsmooth  optimization  problems  [10].  The  most  weU-known  one 
is  the  Huber  M-estimator  problem  that  combines  the  robustness  of  the  Unear 
£i  regression  model  with  the  smoothness  of  the  Unear  maximum  UkeUhood 
regression  model  by  using  the  Huber’s  cost  function,  depending  on  a  given 
tuning  constant  7  >  0,  defined  by 


f  if  |«|  <  7 

l  7l*|  ~  \l2'>  otherwise. 


The  Huber  M-estimator  problem  is  to  find  a  vector  x*  in  such  that  x* 
solves  the  following  minimization  problem: 


™  (11) 

t=l 


where  (Ax  —  d)i  denotes  the  2th  component  of  the  vector.  In  general,  there 
is  a  scaling  parameter  r  involved  in  the  Huber  M-estimator  problem.  For 
convenience,  we  assume  that  r  =  1.  Note  that  Huber’s  cost  function  is 
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a  convex  differentiable  piecewise  quadratic  polynomial  and  the  objective 
function  in  (11)  is  differentiable.  The  purpose  of  Huber’s  cost  function  is  to 
measure  (Ax  —  d)i  by  (Ax  —  d)?  (as  in  maximum  likelihood  estimation)  when 
(Ax  —  d)i  is  “relatively  small”  while  it  measures  (Ax  —  d)i  by  |(Ax  —  d)tj  (as 
in  linear  t\  estimation)  when  is  an  outlier  (i.e.,  | (Ax  -  d),j  is  “relatively 
large”). 

Recently,  there  have  been  many  papers  devoted  to  design  of  numerical  al¬ 
gorithms  for  solving  the  Huber  M-estimator  problem  [5,  6,  16,  17,  23,  28],  as 
well  as  to  relationships  between  linear  l\  estimators  and  Huber  M-estimators 
[4,  18,  19].  Madsen  and  Nielsen  [18],  and  Madsen,  Nielsen,  and  Pinar  [19] 
show  that  algorithms  for  computing  Huber  M-estimators  can  also  be  used  to 
find  linear  l\  estimators,  by  establishing  an  explicit  correspondence  between 
linear  t\  estimators  and  Huber  M-estimators.  The  purpose  of  this  section  is 
to  show  that  more  revealing  relationships  between  linear  l\  estimators  and 
Huber  M-estimators  can  be  easily  derived  through  dual  formulations  of  (10) 
and  (11).  More  specifically,  we  obtain  that  the  dual  solution  of  the  Huber 
M-estimator  problem  is  the  least  norm  solution  of  the  dual  linear  program¬ 
ming  problem  of  (10)  when  the  fairing  parameter  7  >  0  is  small  enough. 
The  proof  is  based  on  a  characterization  of  the  least  norm  solution  of  a  lin¬ 
ear  program  proved  by  Mangasarian  and  Meyer  [21,  20],  As  consequences, 
we  not  only  recover  the  known  results  about  linear  l\  estimators  and  Hu¬ 
ber  M-estimators,  but  also  derive  new  ones.  This  allows  us  to  explicitly 
verify  whether  a  given  parameter  7  is  small  enough  to  produce  a  linear  l\ 
estimator.  We  will  propose  a  recursive  version  of  Madsen  and  Nielsen’s  al¬ 
gorithm  for  finding  a  linear  l\  estimator  by  solving  the  corresponding  Huber 
M-estimator  problem. 

It  is  well-known  that  either  (10)  or  (11)  may  have  infinitely  many  so¬ 
lutions.  Sufficient  conditions  for  uniqueness  of  the  linear  l\  estimator  and 
the  Huber  M-estimator  are  given  in  [4,  17,  18,  19].  However,  it  is  not  clear 
whether  there  is  a  connection  between  the  uniqueness  of  the  linear  l\  es¬ 
timator  and  the  Huber  M-estimator.  We  show  that  the  uniqueness  of  the 
Huber  M-estimator  for  every  small  tuning  factor  7  implies  the  uniqueness  of 
the  linear  l\  estimator.  However,  a  counterexample  shows  that  the  converse 
does  not  hold.  That  is,  there  exist  A  and  d  such  that  (10)  has  a  unique  solu¬ 
tion,  but  the  corresponding  Huber  M-estimator  problem  (11)  has  infinitely 
many  solutions  for  every  0  <  7  <  <5,  where  6  is  a  positive  constant. 

The  section  is  organized  as  follows.  Dual  problems  of  (10)  and  (11)  are 
given  in  Subsection  3.2  and  characterizations  of  solutions  of  (10)  and  (11)  are 
also  given  in  terms  of  their  dual  solutions,  respectively.  In  Subsection  3.3,  we 
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first  give  the  Lipschitz  stability  of  X1  with  respect  to  the  perturbations  of  7. 
Then,  based  on  the  dual  characterizations  given  in  the  previous  subsection, 
we  establish  a  local  linearity  property  of  the  solution  set  X1  of  (11)  as  a  set¬ 
valued  mapping  of  the  parameter  7  when  7  >  0  is  small  enough.  Moreover, 
we  obtain  that  the  uniqueness  of  the  Huber  M-estimator  for  small  positive 
tuning  parameter  7  implies  the  uniqueness  of  the  linear  l\  estimator.  The 
converse  is  also  true  under  the  nondegeneracy  assumption  on  the  least  norm 
solution  of  a  dual  problem  of  (10).  But  the  converse  is  not  true  in  general. 
We  give  an  example  of  the  linear  l\  estimation  problem  which  has  a  unique 
solution,  but  the  corresponding  Huber  M-estimator  problem  (11)  does  have 
infinitely  many  solutions  for  0  <  7  <  6  with  some  positive  constant  S.  In 
Subsection  3.4,  a  recursive  algorithm  for  finding  a  linear  t\  estimator  is 
proposed.  The  algorithm  allows  one  to  explicitly  construct  a  solution  of 
(10)  by  solving  finitely  many  Huber  M-estimator  problems.  Conclusions  are 
included  in  Subsection  3.5. 

To  conclude  the  subsection  we  give  some  notations  used  in  this  section. 
If  A  is  a  matrix  and  a:  is  a  (column)  vector,  At*  denotes  the  row  of  A  and 
X{  denotes  the  zth  component  of  x.  (Note  At-x  =  (Ax),*.)  For  a  vector  x  (or 
a  matrix  A),  xT  (or  AT)  is  its  transpose.  For  two  vectors  x  and  z/,  x  <  y 
means  xt-  <  yi  for  every  index  z.  For  convenience,  when  x  is  a  vector  and  a  is 
a  number,  x  <  a  denotes  x,  <  a  for  every  index  i.  We  use  to  denote  the 
vector  whose  ith  component  is  max{- 7,min{7,;z,}}.  The  ^-norm,  ||  *  ||oo, 
of  a  vector  z  €  Htm  is  ||^||oo  =  maxi<t<m  \z{\.  The  set  of  all  solutions  of  (10) 
is  denoted  by  X°,  while  the  set  of  all  solutions  of  (11)  is  denoted  by  X1  for 
any  given  tuning  parameter  7.  The  null  space  of  a  matrix  A  is  defined  as 
{x  €  ]Rn  :  Ax  =  0}. 

3.2  Dual  Formulations 

In  this  subsection  we  give  the  dual  formulations  of  (10)  and  (11)  as  a  linear 
programming  problem  and  a  quadratic  programming  problem,  respectively. 
These  dual  formulations  provide  characterizations  of  linear  l\  estimators 
and  Huber  M-estimators  in  terms  of  their  dual  solutions.  Then,  from  Man- 
gasarian  and  Meyer’s  characterization  of  the  least  norm  solution  of  a  linear 
program  [21,  20],  we  know  that  the  unique  dual  solution  of  (11)  is  actually 
the  least  norm  dual  solution  of  (10)  when  7  >  0  is  small  enough. 

Consider  the  following  dual  formulation  of  (10): 

min  dTy  subject  to  ATy  =  0,  —  1  <  y  <  1.  (12) 
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The  following  lemma  is  essentially  known,  but  the  form  of  the  statement 
given  here  is  more  convenient  for  our  purposes.  For  another  form  of  the 
lemma,  see  [24,  19]. 

Lemma  2  Let  x  €  IR”  and  y  6  IR”1.  Then  x  and  y  are  solutions  of  (10) 
and  (12),  respectively,  if  and  only  if  the  following  conditions  hold: 

ATy  =  0,  -1  <  y  <  1, 

(Ax  -  d)i  >  0,  if  yi  =  1, 

(  lO  ) 

(Ax  -  d)i  <  0,  if  &  =  -1, 

(Ax  -  d){  =  0,  if  —  1  <  yi  <  1. 

Note  that  the  standard  KKT-conditions  for  (12)  involve  a  multiplier  z 
for  the  constraints  —  1  <  y  <  1  and  an  equation  d  —  Ax  —  z  =  0.  Here 
we  use  z  :=  —  ( Ax  —  d )  to  eliminate  z  in  the  above  lemma.  The  following 
characterization  of  linear  t\  estimators  based  on  a  solution  to  (12)  follows 
immediately  from  Lemma  2. 

Corollary  3  Let  y*  be  a  solution  of  (12).  Then  x  is  a  solution  of  (10)  if 
and  only  if  x  satisfies  the  following  conditions: 

(Ax  -  d)i  >  0,  ify*  =  1, 

(Ax  -  d)i  <  0,  ify*  =  -1,  (14) 

(Ax  -  d)i  =  0,  if  -l<y(  <  1. 

The  classical  approximation  theoretic  characterization  of  solutions  to 
(10)  states  that  x  6  lRn  solves  (10)  if  and  only  if  there  exists  y  €  Rm  such 
that  —  1  <  y  <  1,  ATy  —  0,  and  yT(Ax  —  d)  =  || Ax  —  c?|ji  (cf.  [29,  Chapter 
6]).  It  is  easy  to  see  that  these  conditions  are  equivalent  to  (13)  or  (14). 

Now  consider  the  least  norm  solution  of  (12),  which  is  the  unique  solution 
of  the  following  quadratic  program: 

min  dTy  +  ^\\y\\]  subject  to  ATy  =  0,  -1  <  y  <  1,  (15) 

y  z 

when  e  >  0  is  sufficiently  small,  as  shown  in  the  following  lemma  by  Man- 
gasarian  and  Meyer  [21,  20]. 

Lemma  4  Let  yc  be  the  unique  solution  of  the  strictly  convex  quadratic 
program  ( 1 5).  Then  there  exists  8  >  0  such  that  ye  is  the  least  norm  solution 
of  (12)  for  0  <  e  <  8. 
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Here  we  say  that  y*  is  the  least  norm  solution  of  (12)  if  y*  is  the  solution 
of  the  following  strictly  convex  quadratic  programming  problem: 


min 

y£Y* 


where  Y*  is  the  set  of  all  solutions  to  (12).  That  is,  y*  is  the  least  norm 
solution  of  (12)  if  y*  is  the  solution  of  (12)  that  has  the  smallest 
The  least  norm  solution  of  (12)  is  actually  the  unique  common  solution  of 
(12)  and  (15). 


Lemma  5  If  the  solution  y*  of  (15)  is  also  a  solution  of  (12),  then  y*  is 
the  least  norm  solution  of  (12). 

Since  p\t )  =  (t) lc,  the  gradient  of  the  objective  function  in  (11)  is 
At{Ax  -  d) Lc.  Since  (11)  is  a  convex  differentiable  optimization  problem, 
x*  is  a  solution  of  (11)  if  and  only  if  x*  is  a  zero  of  the  gradient  of  the 
objective  function;  i.e.,  x*  is  a  solution  of  the  following  system  of  piecewise 
linear  equations: 

At(Ax  -  d) Lc  =  0.  (16) 

For  easy  reference,  we  give  the  following  lemma  for  the  equivalence  of  (16) 
and  (11). 

Lemma  6  A  vector  x7  is  a  solution  of  (16)  if  and  only  if  x7  is  a  solution 
Of  (11). 

It  is  known  that  (16)  can  be  considered  as  a  dual  problem  of  (15)  [17,  23]. 
Therefore,  (11)  and  (15)  are  dual  problems.  Here  is  the  relation  between 
solutions  of  (15)  and  (11)  [17]. 

Lemma  7  A  vector  ye  is  the  solution  of  (15)  if  and  only  if  yt  =  \(Axe  — 
d) L£,  where  xl  is  a  solution  of  (11). 

Therefore,  in  order  to  get  the  solution  of  (15),  one  only  needs  to  find  a 
Huber  M-estimator  by  solving  (11)  (or  (16)).  Many  algorithms  (including 
Newton  methods,  matrix  splitting  methods,  conjugate  gradient  methods) 
were  developed  for  solving  the  quadratic  program  (15)  by  exploring  its  dual 
structure  (16)  [17,  23,  14]. 

For  a  better  understanding  of  the  duality  between  (11)  and  (15),  we 
recast  Lemma  7  as  follows. 


20 


Lemma  8  Let  x  €  H"  and  y  6  IRm.  Then  x  and  y  are  solutions  of  (11) 
and  (15),  respectively,  if  and  only  if  the  following  conditions  hold: 

ATy  =  0,  -1  <  y  <  1, 

(Ax-dfi  >  7,  ify{  =  1,  _ 

(Ax  -  d)i  <  -7,  if  yi  =  -1,  1 

(Ax  -  d)i  =  7 y,,  if  -  1  <  jr,-  <  1. 

Note  that  the  standard  KKT-conditions  for  (15)  involve  a  multiplier  z 
for  the  constraints  —  1  <  y  <  1  and  an  equation  d  —  ey  —  Ax  —  z  =  0. 
Here  we  use  z  :=  -(Ax  —  d  -  ey)  to  eliminate  z  in  the  above  lemma.  The 
following  characterization  of  the  Huber  M-estimators  based  on  a  solution  to 
(12)  follows  immediately  from  Lemma  8. 

Corollary  9  Let  y*  be  a  solution  of  (15).  Then  x  is  a  solution  of  (11)  if 
and  only  if  x  satisfies  the  following  conditions: 

(Ax  -  d)i  >  7,  ify*{  =  1, 

(Ax  -  dfi  <  -7,  if  y(  =  -1,  (18) 

(Ax  -  dfi  =  73/*,  if  -  1  <  y*  <  1. 

Lemma  5  provides  a  way  to  verify  whether  or  not  the  solution  of  (15) 
is  the  least  norm  solution  of  (12).  By  Lemma  4,  there  exists  a  positive 
constant  6  such  that  the  solution  of  (15)  is  the  least  norm  solution  y*  of  (12) 
for  0  <  7  <  6.  But  we  do  not  know  how  small  6  should  be.  The  following 
lemma  tells  us  that  if  the  solution  of  (15)  is  the  least  norm  solution  y*  for 
some  7  =  fj  >  0,  then  y*  is  also  the  solution  of  (15)  for  any  0  <  7  <  <5. 

Lemma  10  If  the  solution  ■ y6  of  (15)  is  the  least  norm  solution  y*  of  (12) 
for  some  7  =  6  >  0,  then  the  solution  y1  of  (15)  is  the  least  norm  solution 
of  (12)  for  any  0  <  7  <  6  ( i.e y 7  =  y*  for  0  <  7  <  6). 

The  above  lemma  should  be  compared  with  Lemma  3  and  Theorem  4 
of  [19]  wherein  results  are  formulated  in  terms  of  a  “sign  vector”  associated 
with  solutions  to  (16).  The  sign  vector  s7  is  related  to  y7  as  follows:  sf  =  y 7 
if  \y?\  =  1  and  $7  =  0  if  \yf\  <  1.  Madsen,  Nielsen,  and  Pinar  [19]  proved 
that  s7  =  s6  for  0  <  7  <  5  if  6  >  0  is  small  enough. 
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3.3  Linear  l\  Estimator  and  Huber  M-Estimator 

By  Hoffman’s  error  bound  for  approximate  solutions  of  a  system  of  linear 
equalities  and  inequalities,  we  obtain  the  Lipschitz  stability  of  X 7  with 
respect  to  7.  By  using  the  dual  characterizations  given  in  Lemmas  3  and  9,  as 
well  as  Lemma  4,  we  show  that  the  solution  set  X 7  of  (11)  is  a  linear  mapping 
of  the  parameter  7  when  7  >  0  is  small  enough.  Moreover,  we  obtain  that 
the  uniqueness  of  the  Huber  M-estimator  for  small  positive  tuning  parameter 
7  implies  the  uniqueness  of  the  linear  l\  estimator.  The  converse  is  also  true 
under  the  nondegeneracy  assumption  on  the  least  norm  solution  of  (12). 
But  the  converse  is  not  true  in  general.  We  give  an  example  of  the  linear 
t\  problem  which  has  a  unique  solution,  but  the  corresponding  Huber  M- 
estimator  problem  (11)  does  have  infinitely  many  solutions  for  0  <  7  <  6 
with  some  positive  constant  6.  Our  example  is  based  on  a  new  explicit 
characterization  of  the  least  norm  solution  of  (12). 

First,  we  show  the  Lipschitz  stability  of  Huber  M-estimators  with  respect 
to  the  perturbations  of  7.  Let  Xe  be  the  solution  set  of  (11)  and  X°  be  the 
solution  set  of  (10). 

Theorem  11  Let  6  >  0  be  such  that  the  solution  of  (15)  for  7  =  6  is  the 
least  norm  solution  y *  of  (12).  Then  there  exists  a  positive  constant  A  such 
that 

H(X7,Xe)<A*e  for  0<7,7<£,  (19) 

where  H(X,  F)  is  the  Hausdorff  distance  between  X  andY  defined  as 

H(X,F)  :=  max  <  max  min  \\x  —  y||2,maximn  \\y  -  x||2 

l  x£X  yCY  y€Y  x£X 

Note  that  the  above  theorem  shows  that,  for  small  c,  a  Huber  M-estimator 
is  almost  the  same  as  a  linear  l\  estimator  in  the  sense  that  the  difference 
is  a  constant  multiple  of  e. 

Various  relations  between  Xe  and  X°  were  given  by  Madsen,  Nielsen, 
and  Pinar  [19].  However,  these  relations  involve  a  sign  pattern  of  certain 
vectors.  Here  we  give  a  pure  algebraic  relation  between  X7  and  X°. 

Theorem  12  Suppose  that  the  solution  of  (15)  for  some  7  =  5  is  the  least 
norm  solution  y *  of  (12).  Then 

XT  D  for  0<a<r<f3<6,  (20) 

p  —  a  p  —  a  ~ 

where  pX  +  vY  :=  {px  +  vy  :  x  6  X,  y  €  Y}. 
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The  above  theorem  shows  that  X7  is  a  convex  set-valued  mapping  of 
7  for  0  <  7  <  6.  One  important  technical  aspect  of  the  above  theorem  is 
that  we  “know”  how  small  6  should  be.  However,  X7  is  actually  a  linear 
set- valued  mapping  of  7  for  7  >  0  small  enough.  The  drawback  of  the  next 
theorem  is  that  there  is  no  criterion  to  identify  how  small  the  7’s  should  be. 

Theorem  13  If  the  columns  of  A  are  linearly  independent  (i.e.,  the  rank 
of  A  is  n ),  then  there  exists  a  positive  constant  e  such  that 

XT  =  for  0  <a<T</3<e.  (21) 

p  —  a  p  —  a  . 

Let  a  =  0,  (3  =  6,  and  0  <  r  <  6  in  Theorem  12.  By  (20),  we  have  XT  D 
jXs  +  (1  —  j)  X°.  If  XT  =  {xT}  has  only  one  element,  then  X°  =  {x0} 
and  Xs  =  {x5}  are  both  singletons.  Moreover,  xT  =  jxs  +  (1  —  f)  x°.  That 
is,  xT  is  a  linear  function  of  r  for  0  <  r  <  6.  Thus,  we  have  the  following 
consequence  of  Theorem  12. 

Theorem  14  Suppose  that  the  solution  of  (15)  for  some  7  =  6  is  the  least 
norm  solution  of  (12).  If  the  Huber  M-estimator  problem  (11)  has  a  unique 
solution  x7  for  7  >  0  small  enough,  then  the  linear  l\  estimation  problem 
(10)  has  a  unique  solution  x°.  Moreover,  x7  =  jxs  +  (1  —  J)  x°  is  a  linear 
function  of  7  for  0  <  7  <  6. 

It  is  interesting  to  see  that  x°  =  —  jf^x&  for  0  <  7  <  6.  That 

is,  we  can  find  the  unique  linear  l\  estimator  by  two  Huber  M-estimators  of 
different  small  tuning  parameters. 

The  converse  of  the  above  theorem  about  the  uniqueness  is  also  true  if 
the  least  norm  solution  of  (12)  is  nondegenerate.  A  solution  y*  of  (12)  is 
said  to  be  nondegenerate  if  there  exists  a  solution  x  of  (10)  such  that 

(Ax  -  d)i  >  0,  if  y*  =  1, 

(Ax  -  d)i  <  0,  if  1 1*  =  -1,  (22) 

(Ax  -  d)i  =  0,  if  -  1  <  y*  <  1. 

Theorem  15  Suppose  that  the  least  norm  solution  of  (12)  is  nondegenerate 
and  the  linear  l\  estimation  problem  (10)  has  a  unique  solution.  Then  there 
exists  S  >  0  such  that  the  Huber  M-estimator  problem  (11)  has  a  unique 
solution  x£  for  0  <  e  <  6.  Moreover,  $(x)  is  actually  a  strictly  convex 
quadratic  function  in  a  neighborhood  of  x7  for  0  <  7  <  6  and  $(x)  is 
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twice  differentiable  at  xe  (i.e.,  the  Hessian  $"(xc)  exists),  where  $(x)  is  the 
objective  function  in  (11);  i.e., 


m 

sO^EpK^-'O*-]- 

»=i 

Remark.  Let  y1  be  the  solution  of  (15)  and  W{i)  be  the  diagonal  ma¬ 
trix  whose  ith  diagonal  entry  is  1  if  —  1  <  yf  <  1  and  0  otherwise.  Since 
W(^)Az  =  0  has  a  unique  solution  z*  =  0  whenever  ATW(^)A  is  nonsin¬ 
gular,  the  above  proof  implies  that  (11)  has  a  unique  solution  if  AtW{i)A 
is  nonsingular.  This  sufficient  condition  for  uniqueness  of  the  Huber  M- 
estimator  was  observed  by  many  authors  [4,  17,  18,  19]. 

However,  the  above  theorem  is  not  true  without  the  nondegeneracy  as¬ 
sumption.  To  construct  a  counterexample,  we  need  the  following  explicit 
characterization  of  the  least  norm  solution  of  (12). 

Theorem  16  Let  ip  €  !Rm  be  such  that  pTA  —  0,  maxi<t<TO  |y,|  =  1,  and 
there  exists  a  positive  constant  6  such  that,  for  0  <  7  <  S, 


X\p)  := 


jx  €  Rn  : 


<Pi(Ax  -  d)i  >7  if  |v>,-|  =  1,  | 
{Ax  -  d)i  =  (pn  if  \pi\  <  1  / 


Then  cp  is  the  least  norm  solution  of  (12)  and  X 1(<p)  =  X7  for  0  <  7  <  S. 


One  can  construct  an  example  showing  that  the  least  norm  solution  {p 
of  (12)  is  degenerate,  even  though  (12)  has  nondegenerate  solutions.  In  the 
case  that  a  =  0,  the  Huber  M-estimator  problem  (11)  also  has  a  unique 
solution  x 7  with  x'l  =  0  and  x\  —  J  for  small  7,  despite  the  degeneracy 
of  the  least  norm  solution  < p .  Therefore,  the  nondegeneracy  assumption  in 
Theorem  15  is  not  necessary  for  the  uniqueness  of  the  Huber  M-estimator. 


3.4  Computation  of  A  Linear  lx  Estimator 

In  [18],  Madsen  and  Nielsen  proposed  to  compute  a  sign  pattern  vector  s  for 
the  linear  t\  estimation  problem  (10)  by  solving  (11)  for  very  small  tuning 
parameter  7.  Then,  by  solving  a  system  of  linear  inequalities  and  equalities, 
a  linear  t\  estimator  can  be  obtained.  In  this  subsection  we  propose  a 
recursive  version  of  Madsen  and  Nielsen’s  algorithm  for  finding  a  linear 
L\  estimator.  By  the  dual  relationships  given  in  Subsection  3.2,  we  know 
exactly  how  small  7  should  be  in  computation.  Moreover,  unlike  Madsen 
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and  Nielsen’s  approach,  a  linear  t\  estimator  is  explicitly  constructed  by 
using  Huber  M-estimators. 

First  let  us  explain  Madsen  and  Nielsen’s  method  for  computing  a  linear 
l\  estimator  by  solving  Huber  M-estimator  problems.  Suppose  that  we 
have  a  solution  x 7  of  (16)  (or  (11))  for  some  very  small  7.  By  Lemma  7, 
y 7  =  i( Ax 7  —  d)l7  is  a  solution  of  (15).  Then,  by  Lemma  4,  y 7  is  the  least 
norm  solution  of  (12).  By  Corollary  3,  one  can  get  a  linear  l\  estimator 
by  solving  the  feasibility  problem  (14),  a  system  of  linear  inequalities  and 
equalities  with  y*  =  y7. 

In  general,  a  system  of  linear  inequalities  and  equalities  is  not  trivial  to 
solve.  However,  under  the  assumption  of  nonsingularity  of  a  certain  matrix, 
(14)  is  reduced  to  a  system  of  linear  equations.  More  specifically,  let  W(e) 
be  the  mxm  diagonal  matrix  whose  ith  diagonal  entry  is  lif\yf\  <  1  and  0 
if  \yf  \  =  1.  By  Lemma  4,  there  exists  a  positive  number  6  such  that  yt  =  y°, 
the  least  norm  solution  of  (12),  for  0  <  e  <  6.  As  a  result,  W(e)  =  IF(0) 
is  a  constant  matrix  for  0  <  e  <  6.  If  ATW(e)A  is  nonsingular,  then  the 
unique  solution  of  (14)  can  be  found  by  solving  a  system  of  linear  equations 
as  shown  in  the  following  theorem' 

Lemma  17  Let  6  >  0  be  such  that  the  solution  y8  of  (15)  (with  7  =  6)  is 
the  least  norm  solution  of  (12),  and  let  TF(£)  be  the  mxm  diagonal  matrix 
whose  ith  diagonal  entry  is  1  if  \y8\  <  1  and  0  if  \yf\  ~  1.  If  ATW(6) A  is 
nonsingular ,  then  the  Huber  M-estimator  problem  (11)  has  a  unique  solution 
xe  for  0  <  €  <  6  and 

xe  =  x6  +  (e  -  6)z° ,  (23) 

where  z°  is  the  unique  solution  of  the  following  linear  system: 

W(6)Az  =  W(6)y8.  (24) 

Moreover ,  the  linear  t\  estimation  problem  (10)  has  a  unique  solution  x°  = 
x8  —  6z°,  which  is  also  the  unique  solution  of  the  following  linear  system: 


W(6)Ax  =  W(6)d.  (25) 

In  general,  from  (14),  we  know  that  all  solutions  of  (10)  are  in  the  set 
X(y~<)  :=  {x6En:  (Ax  -  d)i  =  0  if  \y?\  <  1}. 

Therefore,  we  consider  the  following  minimization  problem: 


min 

xexiy'i) 


J2p[(Ax  -  d)i], 

1=1 


(26) 
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which  is  equivalent  to  a  system  of  piecewise  linear  equations  similar  to  (16): 

AT(Ax  -  d)l7  =  0,  (27) 

where  x  =  x1+Tx,  T  is  a  transformation  matrix  with  appropriate  order,  and 
x  €  IRn  with  n  <  n.  Now  we  consider  that  (27)  is  the  Huber  M-estimator 
problem  (cf.  (16))  corresponding  to  a  “simpler”  linear  l\  estimation  problem 
and  try  to  find  a  solution  of  the  new  linear  l\  estimation  problem.  One 
important  relation  between  these  two  t\  estimation  problems  is  that  if  x*  is 
a  solution  of  the  new  i\  estimation  problem,  then  x*  =  x1  +Tx*  is  a  solution 
of  the  original  i\  estimation  problem.  Therefore,  the  problem  is  reduced  to 
finding  a  solution  of  a  “simpler”  problem.  After  repeating  this  process  a  few 
times,  we  see  that  the  condition  in  Lemma  17  will  be  satisfied  for  the  final 
Huber  M-estimator  problem  and  then  we  can  get  a  solution  of  the  final  linear 
l\  estimation  problem  by  solving  a  system  of  linear  equations.  By  doing  so, 
we  can  explicitly  construct  a  linear  l\  estimator  of  (10)  by  using  the  Huber 
M-estimators  and  transformation  matrices  obtained  in  the  process. 

The  above  recursive  method  of  computing  a  linear  l\  estimator  is  for¬ 
mulated  in  the  following  algorithm. 

Algorithm  18  Let  k  =  0,  T°  =  I  (the  n  x  n  identity  matrix),  d°  =  d, 
x°  =  0,  and  A0  —  A.  Compute  a  solution  of  (10)  as  follows: 

Step  1.  find  e  >  0  and  a  solution  uk  of 

{Ak)T  (Aku-dky_^  =  0  (28) 

such  that  vk  :=  ((Akuk  —  dk)e_e  is  a  solution  of 

nun  ( dk)Tv  subject  to  ( Ak)Tv  =  0,  —  1  <  v  <  1;  (29) 

Step  2.  let  Wk  be  the  diagonal  matrix  whose  ith  diagonal  entry  is  1  if 
\vk |  <  1  and  0  otherwise; 

Step  3.  if  k  >  0  and  Wk  =  Wk~x,  let  xk+1  :=  xk  +  Tkuk  and  go  to  Step  9; 
Step  4.  find  a  solution  zk  of  the  linear  system 

WkAkz  =  Wkdk ;  (30) 


Step  5.  let  xk+1  :=  xk  +  Tkzk; 
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Step  6.  if  (Ak)TWkAk  is  nonsingular,  then  go  to  Step  9; 

Step  7.  let  the  columns  of  Bk  be  a  basis  of  the  null  space  ofWkAk; 

Step  8.  set  Ak+1  :=  AkBk,  Tk+1  =  TkBk,  dk+1  :=  dk  -  Akzk,  k  :=  k  +  1, 
and  go  to  Step  1; 

Step  9.  output  xk+1  as  a  solution  of  (10). 

This  algorithm  finds  a  linear  i\  estimator  in  finitely  many  iterations  as 
shown  in  the  following  theorem. 

Theorem  19  For  any  A  and  d ,  Algorithm  18  finds  a  solution  of  the  linear 
t\  regression  problem  (10)  in  finitely  many  iterations. 

3.5  Conclusions 

We  show  that  many  new  relationships  between  the  linear  l\  estimators  and 
the  Huber  M-estimators  can  be  derived  once  one  understands  that  the  dual 
solution  of  the  Huber  M-estimator  problem  (11)  with  a  small  tuning  pa¬ 
rameter  7  is  the  least  norm  solution  of  the  dual  problem  of  the  linear  l\ 
estimation  problem  (10).  The  dual  connection  between  (11)  and  (10)  allows 
us  to  establish  a  set-valued  version  of  the  linearity  property  of  the  Huber 
M-estimator  with  respect  to  the  tuning  parameter  7,  as  well  as  the  Lipschitz 
stability  of  the  Huber  M-estimators  with  respect  to  perturbations  of  7.  It 
also  allows  us  to  identify  how  small  the  tuning  parameter  should  be  in  the 
computation  of  the  Huber  M-estimators  for  finding  a  linear  t\  estimator. 
By  exploiting  the  dual  connection,  we  have  an  almost  complete  understand¬ 
ing  of  how  the  uniqueness  of  the  Huber  M-estimator  with  a  small  tuning 
parameter  is  related  to  the  uniqueness  of  the  linear  l\  estimator. 

Our  study  also  provides  a  new  interpretation  of  the  role  of  the  Huber 
M-estimator  in  Madsen  and'  Nielsen’s  method  for  computing  a  linear  l\  es¬ 
timator.  In  the  case  that  the  least  norm  solution  of  (12)  is  nondegenerate, 
all  common  zero  indices  of  the  error  vectors  ( Ax°  —  b )  (with  :r°  in  the  so¬ 
lution  set  of  (10))  can  be  identified  by  the  Huber  M-estimator  with  a  small 
tuning  parameter  7  and  a  linear  t\  estimator  can  be  explicitly  constructed 
by  solving  another  Huber  M-estimator  problem.  However,  when  the  least 
norm  solution  of  (12)  is  degenerate,  we  have  to  repeatedly  solve  Huber  M- 
estimator  problems  in  order  to  find  the  common  zero  indices.  Therefore,  we 
propose  a  recursive  version  of  Madsen  and  Nielsen’s  method  for  computing  a 
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linear  l\  estimator  by  repeatedly  solving  Huber  M-estimator  problems.  The 
new  algorithm  explicitly  constructs  a  linear  i\  estimator  based  on  Huber  M- 
estimators,  without  Madsen  and  Nielsen’s  assumption  that  guarantees  the 
uniqueness  of  both  the  Huber  M-estimator  and  the  linear  l\  estimator. 
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4  Global  Error  Bounds  for  Quadratic  Inequali¬ 
ties 

In  this  section  we  report  our  study  on  convex  differentiable  inequalities  and 
show  that  metric  regularity  and  Abadie’s  constraint  qualification  are  equiv¬ 
alent  for  such  inequalities.  For  convex  quadratic  inequalities,  we  show  that 
metric  regularity,  the  existence  of  a  global  error  bound,  and  Abadie’s  con¬ 
straint  qualification  are  mutually  equivalent.  As  a  consequence,  we  derive 
two  new  characterizations  of  weak  sharp  minima  of  a  convex  quadratic  pro¬ 
gramming  problem.  This  is  a  summary  of  a  paper  written  by  the  principle 
investigator  [16]. 

4.1  Review  on  Global  Error  Bounds 

Consider  a  nonempty  convex  subset  S  of  lRn  defined  by  the  following  convex 
inequalities: 

9{x)  <  0,  (31) 

where  g(x)  is  a  mapping  from  ]Rn  to  Rm  and  each  component  gi(x)  of  g(x) 
is  a  convex  function  on  IRn.  Most  likely  one  has  to  resort  to  some  iterative 
method  for  finding  an  approximate  solution  of  (31).  One  important  crite¬ 
rion  for  accuracy  of  an  approximate  solution  x  is  the  amount  of  constraint 
violation:  ||(</(a;))+||.  Here  z+  is  a  vector  whose  ith  component  is  max{0,  Z{} 
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and  ||  •  ||  denotes  the  2-norm  on  IRm  (i.e.,  ||z||2  =  £”=i  lx»|2)-  There  are 
both  practical  and  theoretical  reasons  for  studying  the  following  estimate  of 
the  distance  from  any  point  x  in  Etn  to  the  feasible  set  S: 

dist(x,  5)  <  7  •  ||(<7(a;))+||,  (32) 

where  7  is  a  positive  constant  and  dist(x,  S )  :=  miny€s  ||x  —  y ||.  When  g(x ) 
is  linear,  (32)  is  Hoffman’s  error  estimate  for  approximate  solutions  of  a 
system  of  linear  inequalities  [8].  Error  estimate  was  crucial  for  establish¬ 
ing  linear  convergence  of  various  descent  methods  for  solving  linearly  con¬ 
strained  optimization  problems  [19,  20,  21,  22,  23,  24,  26, 17, 11,  13,  14,  15]. 
From  a  practical  point  of  view,  (32)  guarantees  that  the  distance  from  an 
approximate  solution  x  to  S  is  bounded  by  a  multiple  of  ||({7(x))+||,  an  ex¬ 
plicit  measurement  of  infeasibility.  Roughly  speaking,  one  might  expect  that 
dist(x,  S )  decreases  proportionally  as  ||(#(x))+||.  However,  the  proportional 
constant  7  might  be  large  and  result  in  an  undesirable  situation:  ||(5(^))+|| 
is  quite  small  but  x  might  be  far  away  from  the  feasible  set  S .  This  is  similar 
to  the  ill-conditioning  of  a  system  of  linear  equations.  Therefore,  in  order  to 
know  the  accuracy  of  an  approximate  solution  in  terms  of  its  distance  to  the 
feasible  set,  it  is  important  to  know  what  is  the  exact  value  of  7  in  estimation 
(32).  Mangasarian  defined  the  conditioning  number  of  the  inequality  system 
(31)  as  the  smallest  7  for  which  estimation  (32)  holds  for  all  x  [25].  There 
are  quite  a  few  papers  devoted  to  the  study  of  the  conditioning  number  of 
a  system  of  linear  equalities  and  inequalities  [6,  27,  4,  12,  5,  9,  10]. 

Generally,  (32)  does  not  hold  if  g(x)  is  not  linear.  Robinson  proved 
that  (32)  holds  if  S  is  bounded  and  has  a  nonempty  interior  [28].  For 
an  unbounded  feasible  set  5,  Mangasarian  [25]  established  (32)  under  the 
assumption  that  g%{x)  are  differentiable  convex  functions  and  (31)  satisfies 
Slater’s  condition  (i.e.,  there  exists  a  point  x*  such  that  g(x*)  <  0)  as  well 
as  an  asymptotic  constraint  qualification.  Auslender  and  Crouzeix  extended 
both  Robinson  and  Mangasarian’s  results  by  introducing  a  more  general 
asymptotic  constraint  qualification  that  can  be  applied  to  nondifferentiable 
convex  functions  gi(x).  They  derived  (32)  under  Slater’s  condition  and 
their  asymptotic  constraint  qualification  [2].  However,  asymptotic  constraint 
qualifications  are  difficult  to  verify.  It  was  not  clear  from  Auslender  and 
Crouzeix’s  result  whether  or  not  (32)  holds  if  gi(x)  are  convex  quadratic 
functions.  It  was  proved  recently  by  Luo  and  Luo  [18]  that  (32)  holds  if 
gi(x)  are  convex  linear/quadratic  functions  and  there  exists  a  feasible  point 
x *  of  (31)  such  that  gi(x*)  <  0  whenever  gi(x)  is  not  linear.  That  is,  for 
convex  linear/ quadratic  functions,  (32)  holds  when  Slater’s  condition  holds 
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for  nonlinear  constraints.  Shortly  after,  Pang  and  Wang  showed  that  (32) 
might  not  hold  for  convex  quadratic  inequalities  if  Slater’s  condition  fails 
[32].  They  introduced  an  interesting  concept  called  the  degree  of  singularity 
of  an  inequality  system  and  proved  that,  if  gi(x)  are  convex  linear /quadratic 
functions  and  the  degree  of  singularity  of  (31)  is  d,  then 

dist(r,  5)  <  p  •  (||($(a;))+||  +  ||(s(:c))+||2~d)  for  x  €  Hn.  (33) 

They  also  showed  by  examples  that  the  above  estimate  is  sharp  in  the  sense 
that  there  is  a  convex  quadratic  inequality  system  for  each  d  =  0, 1,  •  •  •  such 
that  [32] 

dist(r,  5) 

IIOMMI  +  ItoMWr'  >  ■ 

Note  that  the  degree  of  singularity  of  (31)  is  always  bounded  by  (m  +  1). 
Therefore,  (33)  always  holds  with  d  =  m  +  1  [32].  This  provides  a  gen¬ 
eral  error  bound  for  approximate  solutions  of  a  convex  quadratic  inequality 
system,  even  though  it  might  not  be  as  sharp  as  one  expects. 

From  Luo-Luo  and  Pang-Wang’s  works  we  can  appreciate  the  impor¬ 
tance  of  Slater’s  condition  in  error  estimate  (32)  for  approximate  solutions 
of  a  convex  quadratic  inequality  system.  However,  one  can  easily  construct 
a  convex  quadratic  inequality  system  that  satisfies  (32)  but  does  not  sat¬ 
isfy  Slater’s  condition:  g\(x\,X2)  =  X\  +  Z2,  <72(^15 £2)  =  —(xi  +  £2)5  and 
9z{xi'>x2)  =  (#i  +  x2)2*  (It  is  a  trivial  case  since  the  nonlinear  constraint 
is  superfluous.  For  nontrivial  examples,  see  Subsection  4.4.)  This  simple 
example  raises  a  natural  question:  what  is  the  characterization  of  a  convex 
quadratic  inequality  system  that  satisfies  (32)?  It  was  this  question  that  led 
us  to  the  discovery  of  some  intrinsic  connections  among  several  seemingly  un¬ 
related  concepts:  Abadie’s  constraint  qualification,  metric  regularity,  global 
error  bounds,  and  weak  sharp  minimum  property. 

The  section  is  organized  as  follows.  In  Subsection  4.2,  we  give  a  de¬ 
tailed  discussion  of  Abadie’s  constraint  qualification,  since  it  plays  a  key 
role  in  this  section.  The  main  result  in  Subsection  4.3  is  the  equivalence 
of  Abadie’constraint  qualification  and  metric  regularity  for  a  differentiable 
convex  inequality  system.  In  Subsection  4.4,  we  apply  this  characterization 
of  metric  regularity  to  derive  a  characterization  of  a  convex  quadratic  in¬ 
equality  system  that  satisfies  (32):  error  estimate  (32)  holds  if  and  only  if 
Abadie’s  constraint  qualification  is  satisfied  at  every  feasible  point.  Since  we 
can  reformulate  a  constrained  minimization  problem  as  an  inequality  sys¬ 
tem,  weak  sharp  minimum  property  may  be  considered  as  a  weaker  form  of 
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error  estimate  (32).  From  this  point  of  view,  we  establish  two  new  charac¬ 
terizations  of  weak  sharp  minimum  property  of  a  convex  quadratic  program. 
Finally,  conclusion  is  included  in  Subsection  4.5. 

4*2  Abadie’s  Constraint  Qualification 

In  this  subsection,  we  review  constraint  qualifications  for  (31),  especially, 
Abadie’s  constraint  qualification.  First,  Abadie’s  constraint  qualification  is 
a  representation  of  the  tangent  cone  by  the  gradients  of  active  constraints, 
which  can  also  be  described  by  a  representation  of  the  normal  cone  by  the 
gradients  of  active  constraints.  For  convex  differentiable  optimization  prob¬ 
lems,  Abadie’s  constraint  qualification  is  the  weakest  condition  that  ensures 
the  characterization  of  an  optimal  solution  by  Karush-Kuhn- Tucker  condi¬ 
tions. 

For  a  point  £  in  a  convex  set  S',  the  normal  cone  of  S  at  x  is  defined  by 

N(x )  :=  {z  £  lRn  :  zT(y  -  x)  <  0  for  y  £  5}. 

The  tangent  cone  T(x)  of  S  at  x  is  the  polar  of  the  normal  cone  N(x).  That 
is,  y  £  T(x)  if  and  only  if  yTz  <  0  for  every  z  £  N(x).  The  tangent  cone 
T(x)  can  also  be  defined  as  the  closed  convex  cone  generated  by  the  elements 
in  S  —  x. 

Definition  20  We  say  that  the  system  (31)  satisfies  Abadie’s  constraint 
qualification  (Abadie’s  CQ)  x  £  S  ifT(x)  =  {y  £  !Rn  :  gfi(x)Ty  <  0  for  i  £ 
I },  where  I  :=  { i  :  gi(x)  =  0}  is  the  set  of  indices  of  active  constraints  at 
x.  If  Abadie’s  CQ  holds  at  every  point  in  S,  then  we  say  that  (31)  satisfies 
Abadie’s  CQ. 

By  duality,  we  can  also  use  the  normal  cone  to  describe  Abadie’s  con¬ 
straint  qualification. 

Lemma  21  For  the  inequality  system  (31),  Abadie’s  constraint  qualification 
is  satisfied  at  a  point  x  £  S  if  and  only  if  the  normal  cone  of  S  at  x  is 


where  I  :=  {i  :  gi(x)  =  0}  is  the  index  set  of  active  constraints  at  x. 
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The  following  relation  about  various  constraint  qualifications  is  well- 
known,  which  implies  that  Abadie’s  constraint  qualification  is  the  weakest 
one  among  them. 

Lemma  22  Consider  the  following  constraint  qualifications  at  a  point  x  E 
S:  . 

(LICQ):  {</<(*) :  i  €  /}  is  linearly  independent, 

(SCQ):  there  exists  x*  such  that  p,(x*)  <  0  for  i  =  1,  •  •  • ,  m, 

(MFCQ):  there  exists  a  vector  u  such  that  g[{x)Tu  >  0  for  i  £  I, 

(ACQ)s  the  tangent  cone  of  S  at  x  is  jy  €  IRn  :  g'i(x)Ty  <  0  for  i  G  /  j  , 

where  I  :=  {i  :  p,(x)  =  0}  is  the  index  set  of  active  constraints  at  x.  Then 

(LICQ)  =>  (SCQ)  (MFCQ)  =►  (ACQ). 

Now  we  show  that  Abadie’s  constraint  qualification  is  the  weakest  con¬ 
dition  that  ensures  the  characterisation  of  an  optimal  solution  of  a  convex 
differentiable  optimization  problem. 

Consider  the  inequality  system  (31)  and  the  following  convex  program: 

min  f(x)  subject  to  gi(x)  <0  for  i  =  1,  •  •  • ,  m.  (34) 

where  f(x)  is  a  differentiable  convex  function  on  ]Rn.  We  say  that  x*  is  a 
Karush-Kuhn- Tucker  point  (KKT  point)  of  (34)  if  there  exist  nonnegative 
scalars  A,-  such  that 

/V)  +  Ea*5.V)  =  0, 

where  I  :=  {i  :  gi(x)  =  0}  is  the  index  set  of  active  constraints  at  z*. 
Lemma  23  The  following  two  statements  are  equivalent . 

(23.1)  The  system  (31)  satisfies  Abadie’s  constraint  qualification. 

(23.2)  For  any  strictly  convex  quadratic  function  f(x)  on  ]Rn,  x*  is  the 
optimal  solution  of  (34)  if  and  only  if  x*  is  a  KKT  point  of  (34)- 

Remark.  In  a  sense,  the  above  lemma  shows  that  Abadie’s  constraint 
qualification  is  the  weakest  CQ  to  guarantee  that  KKT  conditions  hold  for 
optimal  solutions. 

Finally  we  show  that  a  commonly  used  Slater-type  constraint  qualifica¬ 
tion  implies  Abadie’s  constraint  qualification. 
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Lemma  24  [31,  Theorem  28.2]  Suppose  that  there  exists  a  point  x*  such 
that  gi(x *)  <  0  for  i  =  1  and  gi(x*)  <  0  if  gi(x)  is  not  a  linear 

function  of  x.  Then  x*  is  the  optimal  solution  of  (34)  if  and  only  if  x*  is  a 
KKT  point  of  (34)- 

Lemma  25  Suppose  that  there  exists  a  point  x*  such  that  gi(x*)  <  0  for 
i  =  1,  •  •  • ,  m  and  g%(x*)  <  0  if  gi(x)  is  not  a  linear  polynomial  of  x.  Then 
(31)  satisfies  Abadie’s  constraint  qualification . 

4.3  Metric  Regularity  and  Abadie’s  Constraint  Qualification 

It  is  well-known  that  metric  regularity  is  related  to  Slater  condition  and 
MFCQ  [28,  29,  30].  In  this  subsection  we  prove  that  Abadie’s  CQ  is  equiv¬ 
alent  to  metric  regularity  for  a  convex  differentiable  inequality  system. 

Definition  26  We  say  that  the  system  (31)  is  metrically  regular  at  a  point 
x  €  S  if  there  exist  positive  constants  7  and  6  such  that 

m 

dist(x,  S)  <  7  •B*  i(x))+  when  \\x  -  x\\  <  6. 

t=i 

We  say  that  the  system  (31)  is  metrically  regular  if  it  is  metrically  regular 
at  every  point  in  S . 

Note  that  we  are  interested  in  metric  regularity  of  (31)  at  every  point  in 
5.  In  general,  one  needs  Slater  condition  to  ensure  such  a  metric  regularity, 
as  shown  in  the  following  lemma  first  proved  by  Robinson  [28]. 

Lemma  27  If  there  exists  x*  £  lRn  such  that  gi(x*)  <  0  for  i  =  1,  •  •  - 
then  (31)  is  metrically  regular . 

Metric  regularity  is  closely  related  to  error  bounds.  In  fact,  metric  reg¬ 
ularity  of  (31)  is  equivalent  to  error  bounds  for  infeasible  solutions  of  (31) 
on  bounded  subset  of  IRn. 

Lemma  28  The  system  (31)  is  metrically  regular  if  and  only  if  for  any 
number  r  >  0,  there  exists  a  positive  constant  7 (r)  such  that 

m 

dist(x,  S)  <  7 (r)  •  ^P(<7;(x))+  when  ||x||  <  r. 
i=l 

Now  we  state  the  main  theorem  in  this  subsection. 

Theorem  29  The  system  (31)  is  metrically  regular  if  and  only  if  (31)  sat¬ 
isfies  Abadie’s  constraint  qualification. 
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4.4  Error  Bounds 

We  want  to  apply  the  main  theorem  in  the  previous  subsection  (Theorem 
29)  to  a  special  case:  (31)  with  convex  linear/quadratic  functions  gi(x).  In 
this  case,  the  metric  regularity  is  equivalent  to  the  existence  of  a  global 
error  bound  for  infeasible  solutions  of  (31).  As  a  consequence,  we  obtain 
that  Abadie’s  constraint  qualification  is  a  necessary  and  sufficient  condition 
for  a  global  error  bound  given  in  (32).  Our  result  complements  the  study 
done  by  Luo  and  Luo  [18],  Pang  and  Wang  [32]  on  error  bounds  for  convex 
quadratic  inequalities. 

Consider  the  following  system  of  convex  quadratic  inequalities: 

gi{x)  <  0  for  i  =  1,-  •  -,m,  (35) 

where  gi(x)  are  either  linear  or  convex  quadratic  functions  on  lRn. 

The  essence  of  our  proof  is  to  reduce  the  problem  to  the  case  that  Slater 
condition  holds.  Then  we  can  use  the  following  result  by  Luo  and  Luo  [18] 
to  get  (32). 

Lemma  30  If  the  system  (35)  satisfies  the  Slater  condition,  then  there  ex¬ 
ists  a  positive  constant  7  such  that 


Xi9i(x) 

iei(x) 


>7  E  A*' 
iel(x) 


where  I(x )  :=  {i :  gi(x)  =  0}. 


for  x  £  1R",  A i  >  0, 


(36) 


It  is  obvious  that  (32)  implies  metric  regularity.  Therefore,  the  main 
effort  in  proving  the  equivalence  of  (32)  and  metric  regularity  is  to  show 
that  metric  regularity  implies  (32)  for  convex  quadratic  inequalities. 

Theorem  31  The  convex  quadratic  inequality  system  (35)  satisfies  Abadie’s 
constraint  qualification  if  and  only  if  there  exists  a  positive  constant  7  such 
that 

m 

dist(x,  S)  <  7  X^*-(  x))+  for  x  £  IR",  (37) 

<=i 

where  S  :=  {x  £  IR"  :  <7,(2)  <0  for  i  =  1,  •  •  •,  m }. 

Note  that  Luo  and  Luo  [18]  proved  a  special  case  of  Theorem  31:  (36) 
holds  if  there  exists  a  vector  x *  such  that  g(x *)  <  0  and  <7,(2*)  <  0  whenever 
5,(2)  is  not  a  linear  function. 
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Theorem  31  not  only  gives  a  characterization  of  the  existence  of  global 
error  bound  (32)  for  convex  quadratic  inequalities,  but  also  reveals  why  there 
exist  weak  sharp  minima  for  convex  quadratic  programming  problems  [7], 
as  shown  in  the  following  theorem. 

Theorem  32  Assume  that  f(x)  is  a  convex  quadratic  function  bounded  be¬ 
low  on  {x  G  IRn  :  Ax  <  b}.  Let  Lin  :=  min^x<fc  f(x)  and  S  :=  {x  G  !Rn  : 
Ax  <  &, /(x)  =  /min}-  Then  the  following  statements  are  equivalent . 

(32.1)  Abadie’s  constraint  qualification  is  satisfied  at  every  feasible  point  of 
the  following  inequality  system: 

f(x)  -  /min  <  0  and  Ax  -  b  <  0.  (38) 

(32.2)  The  convex  quadratic  programming  problem  min Ax<b  f(x)  has  weak 
sharp  minima .  That  is}  there  exists  a  positive  constant  7  such  that 

f(x)  >  /min  +j  •  dist(x,  S )  when  Ax  <  b.  (39) 

(32.3)  There  exists  a  positive  constant  A  such  that 

dist(x,  S)<  A  -  /min)+  +  W(Ax  -  6)+||)  for  x  G  (40) 

Note  that  various  characterizations  of  weak  sharp  minima  of  a  convex 
quadratic  programming  problem  were  given  by  Ferris  and  Mangasarian  [7]. 
Theorem  31  leads  us  to  two  new  characterizations  (32.1)  and  (32.3)  in  The¬ 
orem  32.  Weak  sharp  minimum  inequality  (39)  estimates  how  far  away 
a  feasible  solution  is  from  the  solution  set.  The  inequality  (40)  actually 
provides  an  estimate  of  the  distance  from  any  approximate  solution  of  the 
quadratic  programming  problem  to  its  solution  set,  which  is  more  desirable 
when  infeasible  approximate  solutions  are  involved.  Even  though  (40)  fails 
to  be  true  if  (38)  does  not  satisfy  Abadie’s  constraint  qualification,  one  could 
still  have  the  following  inequality  [15]: 

dist(z,  S)<  7  (/(*)  -  /min  +  f  7{x)  -  /mil)  for  Ax  -  b  <  0. 
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4.5  Conclusion 


We  have  shown  that  the  concepts  of  metric  regularity,  error  bounds,  and 
weak  sharp  minimum  are  closely  related.  The  essence  of  these  concepts 
is  to  estimate  the  distance  from  an  approximate  solution  to  the  solution 
set  of  the  underlying  problem,  locally  or  globally.  For  metric  regularity  of 
parametric  systems,  MFCQ  was  proven  to  be  a  necessary  and  sufficient  con¬ 
dition  [29,  30].  However,  for  the  non-parametric  version  of  metric  regularity 
defined  here.  Abadie’s  constraint  qualification  is  a  necessary  and  sufficient 
condition.  As  applications,  we  show  that  Abadie’s  constraint  qualification  is 
a  characterization  for  the  existence  of  a  global  error  bound  (32)  for  convex 
quadratic  inequalities,  which  leads  to  a  global  error  bound  (40)  for  approxi¬ 
mate  solutions  of  a  convex  quadratic  programming  problem  with  weak  sharp 
minima. 
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5  Merit  Functions  for  Complementarity  Problems 

We  introduce  a  new  merit  function  PQ(x )  for  a  symmetric  linear  complemen¬ 
tarity  problem  (symmetric  LCP).  The  merit  function  Pa(x )  is  derived  from 
Hestenes-Powell-Rockafellar’s  quadratic  augmented  Lagrangian  function  for 
a  quadratic  programming  problem  with  simple  bound  constraints.  The  sta¬ 
tionary  points  of  Pa(x)  are  the  solutions  of  the  original  LCP.  We  study 
various  properties  of  Pa(x ),  including  existence  of  global  minimizers,  error 
bounds,  boundedness  of  level  sets,  and  convexity  of  Pa(x).  A  relation  be¬ 
tween  Pa(x)  and  Mangasarian  and  Solodov’s  implicit  Lagrangian  Ma(x)  [27] 
is  established.  As  a  consequence,  we  recover  Peng’s  result  [30]  about  strict 
convexity  of  Ma{x)  for  large  a  and  strongly  monotone  LCP.  A  Newton-type 
method  is  proposed  to  compute  a  solution  of  the  original  LCP  by  finding  a 
stationary  point  of  Pa(x).  If  PQ(x)  is  bounded  below  and  the  matrix  in  LCP 
is  symmetric  and  nondegenerate,  then  the  algorithm  finds  a  solution  of  LCP 
in  finitely  many  iterations.  We  also  discuss  possible  extension  to  symmetric 
nonlinear  complementarity  problem.  This  is  a  summary  of  a  paper  written 
by  the  principle  investigator  [17]." 


5.1  Review  on  Merit  Functions 


Consider  the  following  quadratic  program  with  simple  bound  constraints: 


min  -xtQx  +  qTx, 

Kx<u  2 


(41) 


where  Q  is  an  n  X  n  symmetric  matrix,  q  €  1R”  (a  vector  of  n  components), 
and  /,  u  are  vectors  of  n  components  with  l  <  u.  Some  components  of  l  or 
u  may  be  — oo  or  +oo.  The  corresponding  augmented  Lagrangian  function 
X(x,t/,  a)  introduced  independently  by  Hestenes  [10,  11]  and  Powell  [32]  for 
equality  constraints  and  by  Rockafellar  [33,  34]  for  inequality  constraints 
can  be  written  in  the  following  unified  way  [15]: 


L(x,y,  a)  =  \xtQx  +  qTx  +  f  II  (±(x  -  u)  -  y)  + 

|(^  — *)  +  y)J2-!NI2, 


(42) 


where  y  is  the  Lagrangian  multiplier  corresponding  to  two-sided  inequal¬ 
ity  constraints  and  a  is  a  penalty  parameter.  Note  that  the  Lagrangian 
multiplier  y  in  (42)  should  satisfy  the  following  equation: 


y  =  -{Qx  +  ?)• 


(43) 
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By  substituting  (43)  into  (42),  we  get  a  piecewise  quadratic  penalty  function 
of  (41)  [15]: 


l(x,-(q  +  Qx),a )  =  (±xTQx  +  qTx)  -  §  ||(Qa  +  «)||2, 


+  — 
~2a 


((/  -  x)  +  a(Qx  +  q))  +  +  ^  ((a;  -  u)  -  a(Qx  +  q))+ 


For  quadratic  programs  with  nonnegative  constraints  (/,  =  0,  u,  —  +oo),  we 
have  the  following  penalty  function: 


(a(gx  +  q)  -  *) 


where  is  the  vector  whose  ith  component  is  max{0,2:}.  An  equivalent 
form  of  Pa(x)  was  first  introduced  by  Li  and  Swetits  [18,  15]. 

In  this  section,  we  show  that  the  augmented  Lagrangian  Pa(x)  can  be 
used  as  a  merit  function  for  the  symmetric  linear  complementarity  problem: 

x  >  0,  (' Qx  +  q)>  0,  xt(Qx  +  q)  =  0.  (44) 


Let  the  penalty  parameter  a  be  such  that  0  <  a\\Q\\  <  1.  Then  x*  is 
a  stationary  point  of  Pa(x)  if  and  only  if  x*  is  a  solution  of  LCP  (44). 
Moreover,  we  give  characterizations  for  the  existence  of  a  global  minimizer 
of  Pa(x ),  boundedness  of  level  sets  of  Pa(ar),  and  the  convexity  of  Pa(x ). 
We  also  derive  local/global  error  bounds  in  terms  of  Pa(z). 

One  very  interesting  property  of  PQ(x)  is  its  connection  with  Mangasar- 
ian  and  Solodov’s  implicit  Lagrangian:  Ma(x)  =  Pl(x)  —  Pa(x)  for  LCP. 

a 

Based  on  convexity  analysis  of  Pa(^),  we  obtain  that  Ma(x)  is  strictly  con¬ 
vex  if  a  is  large  and  Q  is  positive  definite,  recovering  a  result  by  Peng  [30]. 

Since  Pa(x)  is  a  differentiable  piecewise  quadratic  function  and  its  gra¬ 
dient  VPQ(x)  =  A(J  —  aQ)(x  —  (x  —  a(Qx  +  q))+)  has  a  simple  struc¬ 
ture,  it  is  very  easy  to  design  an  iterative  method  that  finds  a  stationary 
point  of  PQ(x)  (or  a  solution  of  (44))  in  finitely  many  iterations.  Such 
an  algorithm  is  very  interesting,  since  almost  all  algorithms  for  quadratic 
programs  and  linear  complementarity  problems  can  be  classified  as  either 
iterative  or  finite  methods  [29,  20,  1],  The  most  interesting  property  of 
Pa(x)  is  that,  for  any  xk ,  the  Newton  method  starting  at  xk  produces  an 
iterate  x *+1  that  is  the  approximate  solution  of  (44)  by  using  the  index  set 
J(xk )  =  {i  :  (xk  -a(Qxk  +  q))i  >  0}  as  the  working  active  set  for  constraints 
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(Qx  +  q)>  0.  That  is,  xfc+1  is  the  solution  of  the  following  system  of  linear 
equations: 

(Qx  +  q)i  =  0  for  i  €  «/(xfc),  X{  =  0  for  i  g  J(xk ), 

whenever  the  above  system  has  a  unique  solution.  Therefore,  the  New¬ 
ton  method  for  finding  a  zero  of  VPa(x)  actually  corresponds  to  a  pivotal 
method  that  allows  swapping  of  many  indices  of  working  active  sets  in  each 
step.  However,  it  requires  a  strategy  that  makes  “intelligent”  index  swap¬ 
ping  to  identify  the  active  index  set  of  a  solution  of  (44).  We  show  that 
reduction  of  function  value  of  Pa(x)  can  be  used  for  this  purpose.  Consider 
the  partition  of  IRn  as  a  union  of  following  polyhedral  sets: 

nr  =  U  Wj, 

J 


where 


Wj  =  {x  €  IRn  :  (x  -  a(Qx  +  q))i  >  0  for  i  €  J, 

(45) 

(x  -  a(Qx  +  <?)),•  <  0  for  i  g  J} . 

Note  that  Wj  contains  a  solution  x*  of  (44)  if  and  only  if 
x*  >  0,  (Qx*  +  q)i  =  0,  for  i  6  J, 

(Qx*  +  q)i  >  0,  x*  =  0,  for  i  £  J . 

Therefore,  our  objective  of  using  Pa(x )  for  solving  (44)  is  to  find  an  iterate  xk 
such  that  Wj^xh}  contains  a  stationary  point  or  a  local  minimizer  of  Pa(x). 
This  can  be  achieved  by  using  any  descent  method  for  a  local  minimizer  of 
Pa(x).  In  general,  we  would  like  to  use  the  Newton  direction  as  the  search 
direction.  However,  due  to  nonconvexity  of  Pa(x )  (when  Q  is  not  positive 
semidefinite),  the  Newton  method  with  a  line  search  might  get  stuck  at 
a  nonstationary  point.  In  such  a  case,  we  switch  to  a  gradient  direction 
and  continue  the  search  for  a  region  Wj  that  contains  a  solution  of  (44). 
If  the  linear  systems  for  the  Newton  directions  are  nonsingular,  then  our 
method  is  a  natural  combination  of  a  descent  method  and  a  pivotal  method. 
Geometrically,  our  method  makes  the  iterates  move  toward  a  local  minimizer 
of  Pa(x);  and  algebraically,  the  iterates  facilitate  index  swapping  to  make 
the  current  working  active  sets  more  and  more  accurate.  Once  the  current 
iterate  lands  in  a  region  Wj  containing  a  stationary  point  of  Pa(x )  (which 
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is  guaranteed  by  the  descending  nature  of  our  method),  the  Newton  method 
finds  a  stationary  point  of  Pa(x )  (or  a  solution  of  (44))  in  one  step.  See 
Subsection  5.4  for  further  explanation. 

Recently,  there  are  many  papers  on  merit  functions  of  nonlinear  comple¬ 
mentarity  problems  and  variational  inequality  problems  [3,  4,  5,  9,  12,  13, 
22,  25, 27,  30,  31,  35,  36].  See  a  survey  by  Fukushima  [8]  for  more  references. 

The  section  is  organized  as  follows.  Subsection  5.2  is  devoted  to  the 
discussion  of  properties  of  the  merit  function  Pa(x).  In  Subsection  5.3,  we 
derive  new  error  bounds  for  approximate  solutions  of  (44)  in  terms  of  Pa(x). 
In  Subsection  5.4,  we  propose  a  Newton-type  method  for  finding  a  solution  of 
(44)  and  establish  its  finite  termination  under  some  mild  conditions.  Possible 
extension  to  symmetric  nonlinear  complementarity  problems  is  discussed  in 
Subsection  5.5.  Final  comments  and  conclusions  are  given  in  Subsection  5.6. 

We  conclude  this  subsection  by  introducing  the  notations  and  terminol¬ 
ogy  used  in  this  section.  Let  I  be  the  n  x  n  identity  matrix.  For  an  n  x  n 
matrix  Q ,  Q  is  said  to  be  copositive  if  xTQx  >  0  for  all  vectors  x  >  0,  and  Q 
is  said  to  be  strictly  copositive  if  xTQx  >  0  for  all  vectors  r>0,x^0.  For 
other  matrix  classes  mentioned  in  this  section,  we  refer  the  reader  to  [1].  For 

x  G  ]Rn,  ||x||  =  (E?=i^)?  and  IIOII  =  sup{||<2x||  :  x  G  Rn,||s||  =  1}.  Let 
xT  (or  Qt)  denote  the  transpose  of  x  (or  Q).  For  an  index  set  *7,  xj  denotes 
the  vector  obtained  from  deleting  the  components  of  x  whose  indices  are  not 
in  J.  DJ  denotes  the  diagonal  matrix  whose  ith  diagonal  entry  is  1  if  i  G  J 
and  0  otherwise.  We  use  Jc  =  {i  :  i  £  J}  to  denote  the  complement  of  J . 
For  index  sets  Qjxj2  1S  the  matrix  obtained  from  deleting  the  rows  of 

Q  whose  indices  are  not  in  J\  and  the  columns  of  Q  whose  indices  are  not 
in  t/2 .  For  a  vector  x  G  IRn  and  a  subset  K  of  ]Rn,  the  distance  from  x  to 
K  is  defined  as  follows: 

dist(; i,K)  =  inf{||x  -  y||  :  y  G  K}. 

If  K  is  closed  and  convex,  the  orthogonal  projection  of  x  to  K  is  the  unique 
vector  x*  in  K  such  that  ||x  —  x*||  =  dist(x,A").  In  the  case  that  K  is 
a  union  of  finitely  many  polyhedral  sets,  such  as  the  set  of  local/global 
minimizers  of  Pa(x),  there  might  be  several  x*’s  in  K  such  that  ||x  -  x*||  = 
dist(x,if).  In  such  a  case,  let  us  assume  that  Hk(x)  is  a  vector  in  K  such 
that  ||x  —  II*:(x)||  =  dist(x,  K).  The  sign  of  a  real  number  t  is  denoted  by 
sign(^). 


44 


5.2  Properties  of  Merit  Function  Pa(x ) 

Recently,  there  are  much  attention  given  to  merit  functions  for  the  following 
nonlinear  complementarity  problem  [3,  4,  5,  9,  12,  13,  22,  25,  27,  30,  31,  35, 
36]: 

*  >  0,  F(x)  >  0,  xtF(x )  =  0,  (46) 

where  F(x)  :  lRn  — *•  ]Rn  is  differentiable. 

One  standard  approach  of  constructing  a  merit  function  for  (46)  is  to 
find  a  function  M(x)  >  0  such  that  x*  is  a  solution  of  (46)  if  and  only  if 
M(x*)  =  0.  As  a  result,  (46)  can  be  reformulated  as  as  an  unconstrained 
(global)  minimization  of  M(x).  (For  merit  functions  with  constraints  or 
reformulations  of  (46)  as  a  constrained  minimization  problem,  see  a  survey 
by  Fukushima  [8].)  Such  approaches  lead  to  Mangasarian  and  Solodov’s 
implicit  Lagrangian  [27]: 

M„(x)  =  xrF(x)  +  -  af(x))+l|2 

-IMN  ||(F(*)-«*)+|M|f'MI|2), 

and  Bermeister  and  Fischer’s  NCP-function  [4,  5]: 

•(*)  ~  \  (\/^  +  Fi(x)2  -  a:,-  -  Fi(x))  . 

However,  it  is  difficult  to  design  an  algorithm  for  computing  a  zero  (or  a 
global  minimizer)  of  a  nonnegative  function  M(x).  Most  numerical  meth¬ 
ods  can  only  find  a  stationary  point  or  a  local  minimizer  of  M(x ),  which 
might  not  be  a  solution  of  (46).  Therefore,  it  is  important  to  know  when 
the  stationary  points  of  a  merit  function  M(x)  are  actually  solutions  of 
(46).  Another  issue  about  a  merit  function  is  whether  or  not  its  level  sets 
are  bounded.  In  general,  it  is  not  easy  to  ensure  the  convergence  of  iterates 
generated  by  a  descent  method  for  minimizing  a  nonconvex  function.  How¬ 
ever,  if  a  merit  function  M(x)  has  bounded  level  sets,  then  a  minimizing 
sequence  for  M(x)  is  bounded  and  has  at  least  one  cluster  point. 

Unlike  Ma(x)  and  ^(x),  the  merit  function  jPa(x)  exploits  linearity  of 
F(x)  =  Qx  +  q  and  symmetry  of  Q.  As  a  result,  it  preserves  the  structure 
of  original  LCP  (44)  and  has  many  properties  that  Ma(x )  and  ^(x)  do 
not  have.  In  this  subsection,  we  give  complete  characterizations  for  various 
properties  of  Pa(x),  including  equivalence  of  stationary  points  of  Pa(x)  and 
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solutions  of  (44),  existence  of  global  minimizers,  boundedness  of  level  sets, 
and  convexity  of  Pa(x).  We  also  discuss  the  convexity  of  Ma(x)  by  using 
the  relation  Ma(x )  =  Pi_(x)  —  Pa(x). 

a 

5.3  Stationary  Points  and  Convexity 

In  general,  stationary  points  of  Ma(x)/^(x)  are  not  necessarily  solutions  of 
(46)  and  there  is  no  known  relation  between  the  convexity  of  Ma(x)/''f>(x) 
and  the  monotonicity  of  P(x),  even  if  F(x)  =  Qx  +  q.  Recently,  Peng  [30] 
proved  the  strict  convexity  of  Ma(x )  for  large  a  and  strongly  monotone 
LCP.  In  this  subsection,  we  obtain  the  equivalence  of  stationary  points  of 
Pa(x)  and  the  solutions  of  (44),  as  well  as  the  equivalence  of  the  convexity  of 
Pa(x)  and  the  monotonicity  of  (44).  For  a  positive  definite  Q,  we  show  that 
PQ{x)  is  strictly  convex  for  small  a  and  strictly  concave  for  large  a.  We  also 
establish  a  relation  between  Ma(x )  and  Pa{x):  Ma (x)  =  Pi_(x)  —  Pa(x).  As 

a 

a  consequence,  we  recover  Peng’s  result  [30]  on  strict  convexity  of  Ma(x). 

Note  that  definition  of  Pa(x)  does  not  require  the  symmetry  of  Pa{x). 
Therefore,  some  properties  of  Pa(x)  in  this  subsection  are  given  for  an  ar¬ 
bitrary  n  x  n  matrix  Q . 

Theorem  33  For  any  n  x  n  matrix  Q, 

VP a(x)  =  ~(J  -  aQT )  (x  -  (x  -  a(Qx  +  ?))+)  +  ±(QT  -  Q)x.  (47) 
If  Q  is  symmetric,  then 

VPa(x)  =  i(J  -  otQ )  -  (x  -  a(Qx  +  ■  (48) 

Theorem  34  Suppose  that  Q  is  symmetric  and  0  <  a||Q||  <  1.  Then  x*  is 
a  stationary  point  of  Pa(x)  if  and  only  if  x*  is  a  solution  of  (44)- 

Remark.  Stronger  conditions  are  required  for  the  equivalence  of  stationary 
points  of  Ma(x)/^(x)  and  the  solutions  of  (46).  Yamashita  and  Fukushima 
[36]  proved  that  x  is  a  stationary  point  of  Ma(x)  ( a  >  1)  if  and  only  if 
x  solves  (46),  under  the  assumption  that  VP(x)  is  positive  definite  for  ev¬ 
ery  x  6  lRn.  Independently,  Jiang  [12]  proved  the  same  result  under  the 
assumption  that  VP(x)  is  a  P-matrix  for  every  x  G  ]Rn.  Yamashita  and 
Fukushima  [36]  also  showed  that  there  exists  a  (nonlinear)  strictly  monotone 
function  P(x)  such  that  some  stationary  point  of  Ma(x)  is  not  a  solution  of 
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(46).  For  the  merit  function  ^(x),  Geiger  and  Kanzow  [9]  proved  that  x  is  a 
stationary  point  of  $  if  and  only  if  x  solves  (46),  under  the  assumption  that 
VF(x)  is  positive  semidefinite  for  every  x  £  ]Rn.  Independently,  Facchinei 
and  Soares  [3]  proved  the  same  result  under  the  assumption  that  F(x)  is 
Po-function. 

In  the  case  that  F(x)  =  Qx  +  q  and  Q  is  symmetric,  these  results  lead  to 
the  equivalence  of  stationary  points  of  ^(z)  (or  Ma(x))  and  the  solutions  of 
(46)  under  the  assumption  that  Q  is  positive  semidefinite  (or  Q  is  positive 
definite).  However,  for  positive  semidefinite  Q,  the  augmented  Lagrangian 
Pa(x)  is  actually  convex. 

For  convexity  analysis  of  Pa(x),  we  need  the  following  lemma  about 
convexity  of  a  differentiable  piecewise  quadratic  function. 

Lemma  35  [19,  Lemma  2.1]  Let  g(x)  be  a  differentiable  piecewise  quadratic 
function  and  Vg(x)  =  Bx  +  b  +  /3CT(Cx  +  c)+.  If  B  and  ( B  +  (iCTC)  are 
positive  semidefinite  (or  positive  definite),  then  g(x)  is  a  convex  (or  strictly 
convex)  function. 

Theorem  36  Suppose  that  Q  is  symmetric  and  0  <  a||$||  <  1.  Then 
PQ(x)  is  convex  (or  strictly  convex)  if  and  only  if  Q  is  positive  semidefinite 
(or  positive  definite). 

For  nonsymmetric  Q ,  we  can  still  characterize  the  strict  convexity  of 
Pa{x). 


Theorem  37  Suppose  that  Q  is  an  nxn  matrix.  Then  the  following  state¬ 
ments  are  equivalent . 

1°.  Q  is  positive  definite. 

2°.  There  is  e  >  0  such  that  Pa(x)  is  strictly  convex  for  0  <  a  <  e. 

3°.  There  is  A  >  0  such  that  PQ(x)  is  strictly  concave  for  a  >  A. 

Remark.  The  above  proof  also  shows  that  Pa(x)  is  convex  for  small  a 
if  and  only  if  Q  is  positive  semidefinite  and  Qx  =  0  whenever  xTQx  =  0. 
Moreover,  Pa(x)  can  not  be  concave  for  large  a  if  Q  is  not  positive  definite. 

With  Theorem  37,  we  can  derive  the  strict  convexity  of  Ma(x )  for  affine 
and  strongly  monotone  P(x)  by  using  the  following  representation  of  Ma(x) 
by  Pa(x). 
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Theorem  38  Suppose  that  F(x )  =  Qx  +  q  and  a  >  0.  Then,  Ma(x)  = 
Pi(x)  -  Pa(x )  and  Ma(x)  =  -Mi_(x). 

a  a 


With  Theorem  37  and  Theorem  38,  we  recover  the  following  corollary 
about  the  strict  convexity/concavity  of  Ma(x)  for  affine  and  strongly  mono¬ 
tone  F(x ),  which  was  first  proved  by  Peng  [30,  Theorem  3.1]. 


Corollary  39  Suppose  that  F(x)  =  Qx  +  q  and  Q  is  positive  definite . 
Ma(x )  is  a  strictly  convex  function  if 


a  >  max « 

{non.  W‘ll.2  («-,+(«-')T)'1 

,2 

!(e+«T*|} 

Moreover ,  Ma(x )  is  a  strictly  concave  function  if 

a  <  min  j 

[  1  1  1 

i  1 

JlOII’llfl-'ll’alw-'+w-uV 

’? 

2 

(Q  +  QT)~X  ) 

Then 


(49) 


(50) 


Remark.  Note  that  the  lower  bound  of  a  for  the  strict  convexity  of  Ma(x) 
given  by  Peng  [30,  Theorem  3.1]  is 


a  >  max 
11*11=1 


1  + 

2  xTQx 


In  the  case  that  Q  is  symmetric,  then 

m5 ~2i*Qx'X  -  5(ll<rl11  +  llall) - 

Thus,  Peng’s  lower  bound  for  a  is  sharper  than  (49). 

So  far,  we  are  not  aware  of  any  result  on  convexity  of  ®(x).  Luo  and 
Tseng  [25]  introduced  another  class  of  differentiable  merit  functions  related 
to  $(x)  that  are  convex  if  F(x)  is  monotone. 


5,4  Global  Minimizers  and  Bounded  Level  Sets 

Unlike  Ma(x)  and  ^(x),  which  are  based  on  the  equivalence  of  the  global 
minimizers  of  the  merit  function  and  the  solutions  of  (46),  Pa(x)  might  not 
have  a  global  minimizer.  This  is  undesirable.  However,  we  show  that  Pa(x) 
is  bounded  below  and  has  bounded  level  sets  if  Q  is  strictly  copositive.  We 
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also  give  characterizations  for  the  existence  of  a  global  minimize?  of  Pa(x) 
and  for  boundedness  of  level  sets  of  Pa(x). 

The  existence  of  global  minimizers  of  Pa(x)  and  boundedness  of  level 
sets  of  Pa{x)  are  closely  related  to  the  behavior  of  the  following  quadratic 
programming  problem: 

min  —xtQx  +  qTx.  (51) 

Lemma  40  [15]  Suppose  that  Q  is  symmetric  and  0  <  2||Q||  <  1.  Then 
the  quadratic  programming  problem  (51)  has  a  global  solution  if  and  only 
if  Pa(x)  has  a  global  minimizes  Moreover,  x*  is  a  local  (or  isolated  local) 
solution  of  (41)  if  and  only  if  x*  is  a  local  (or  isolated  local)  minimizer  of 

The  above  lemma  shows  that  the  the  reformulation  of  (51)  as  uncon¬ 
strained  minimization  of  Pa(x)  does  not  change  any  intrinsic  characteristics 
of  (51).  The  above  lemma  reduces  the  existence  of  a  global  minimizer  of 
PQ(x)  to  the  problem  of  the  existence  of  a  global  solution  of  (51).  Eaves  [2] 
proved  that  the  existence  of  a  global  solution  of  a  quadratic  programming 
problem  is  equivalent  to  the  lower  boundedness  of  the  objective  function  on 
any  ray  in  the  feasible  set. 

Lemma  41  [2,  Corollary  8]  The  quadratic  programming  problem  (51)  has 
a  global  solution  if  and  only  if  Q  is  copositive  and  qTx  >  0,  whenever  x  >  0 
and  xTQx  =  0. 

Putting  all  these  results  together,  we  have  the  following  characterizations 
for  the  existence  of  a  global  minimizer  of  Pa{x). 

Theorem  42  Suppose  that  Q  is  symmetric  and  0  <  2a||Q||  <  1.  Then  the 
following  statements  are  equivalent: 

1°.  Pa(x)  has  a  global  minimizer . 

2°.  Pa(x)  is  bounded  below . 

3°,  The  quadratic  program  (51)  has  a  global  solution . 

4°.  Q  is  copositive  and  qTx  >  0 ,  whenever  x  >  0  and  xTQx  —  0. 

In  order  to  give  a  characterization  for  boundedness  of  level  sets  of  PQ(x), 
we  need  the  following  two  lemmas  about  the  connection  between  Pa(x)  and 
the  objective  function  of  (51). 


49 


Lemma  43  [15]  Suppose  that  a  >  0.  Then  Pa(x )  <  | xTQx  +  qTx  ifx  >  0. 

Lemma  44  Let  f(x )  =  \xTQx  +  qTx,  E  =  (/  -  aQ),  and  x°  =  aE~Aq. 
Then,  for  any  index  set  J  and  x  €  Wj, 

Pa(x)  >  Pa(x°)  +  gj  (j DJC(x  -  x°),  DJQDJ(x  -  x°) )  +  /  (j DJ(Ex  -  aq))  , 

where  Wj  is  defined  as  in  (45),  gj(y,z)  is  a  strictly  convex  quadratic  func¬ 
tion  of  (y,z),  Jc  =  {i  :  i  £  J},  and  DJ  denotes  the  diagonal  matrix  whose 
ith  diagonal  entry  is  1  if  i  6  J  and  0  otherwise . 

In  the  following  theorem,  we  give  two  characterizations  for  boundedness 
of  level  sets  of  Pa(x):  one  in  terms  of  boundedness  of  level  sets  for  (51)  and 
another  in  terms  of  Q  and  q . 

Theorem  45  Suppose  that  Q  is  symmetric  and  0  <  2a||Q||  <  1.  Then  the 
following  statements  are  equivalent: 

1°.  Pa(x)  has  bounded  level  sets.~ 

2°.  The  level  set  for  (51),  {z  6  Hn  :  x  >  0,  \xTQx  +  qTx  <  7},  is  bounded 
for  every  7  <  00. 

3°.  Q  is  copositive  and  qTx  >  0,  whenever  x  >  0,  x  /  0,  and  xTQx  =  0. 

Corollary  46  Suppose  that  Q  is  symmetric  and  strictly  copositive.  Then 
the  level  sets  of  Pa(x)  are  bounded  if  0  <  2a||Q||  <  1. 

Remark.  Yamashita  and  Fukushima  [36]  proved  that  Ma(x)  (a  >  1)  has 
bounded  level  sets  if  F(x)  is  strongly  monotone  and  Lipschitz  continuous. 
For  merit  function  ^(x),  Geiger  and  Kanzow  [9]  proved  that  the  level  sets  of 
$  are  bounded  if  F(x)  is  strongly  monotone.  Independently,  Facchinei  and 
Soares  [3]  proved  that  the  level  sets  of  ^  are  bounded  if  F(x)  is  a  uniformly 
P-function. 

5.5  Local  and  Global  Error  Bounds 

Let  S*  be  the  solutions  of  (44),  L*  be  the  set  of  local  minimizers  of  Pa(x ), 
and  G*  be  the  set  of  global  minimizers  of  Pa(x ).  Then,  for  0  <  a||Q||  <  1,  we 
have  G*  CL*  C  S*.  In  general,  G*  /  L*,  L *  £  S *,  and  G*  £  S *.  However, 
if  Q  is  positive  semidefinite,  then  Pa(x)  is  convex  for  0  <  a||Q||  <  1  (cf. 
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Theorem  36)  and  G*  =  L*  —  S*.  There  are  many  results  about  error 
estimates  for  dist(x,  £*),  the  distance  from  x  to  S*.  See  [23,  24,  28,  26,  14, 
22,  21]  and  references  therein.  For  0  <  o||Q||  <  1,  (7  —  aQ)  is  a  positive 
definite  matrix  and,  by  Theorem  33,  we  have  ||VPa(x)||  ss  ||x  -(x-  a(Qx  + 
#)+||.  Thus,  known  error  bounds  in  terms  of  ||x  —  (x  —  a(Qx  +  g)+||  can  be 
rewritten  in  terms  of  ||VPa(x)||.  In  this  subsection,  we  derive  local/global 
error  estimates  of  dist(x,7*)  and  dist(x,G*)  in  terms  of  Pa(x). 

5.6  Local  Error  Bounds 

Since  Pa(x )  can  not  distinguish  stationary  points  of  Pa(x)  from  nonsta¬ 
tionary  points  of  Pa(x),  we  can  not  use  expressions  of  Pa(x)  to  estimate 
dist(x,  S').  However,  we  can  use  Pa(x)  to  estimate  the  distance  from  x  to 
the  set  of  local/global  minimizers  of  Pa(x). 

First  we  need  the  following  lemma  about  the  structure  of  L*  and  G*. 

Lemma  47  Suppose  that  Q  is  symmetric,  a  >  0,  and  is  a  polyhedral 

set.  Then  WnS*  =  WC\L*  ifWnL*  f  0  andW(\S*  -  WnG*  ifWf\G*  ±  0. 

The  next  lemma  gives  error  bounds  on  each  polyhedral  set  Wj. 

Lemma  48  Suppose  that  Q  is  symmetric  and  0  <  2o||Q||  <  1.  Then  there 
exists  a  positive  constant  7  such  that 

~\J PJx)  ~  PQ(n^nL*(a;))  <  dist(x,  WjDL*)  <  7 y/Pa(x)  -  Pa(nM/jni.(x)) 
for  x  €  Wj  with  Wj  fl  7*  /  0. 

In  order  to  replace  dist(x,  Wjf)L*)  by  dist(x,  L*),  we  need  the  following 
lemma  for  lower  bound  of  dist(x,  L*)  or  dist (x,G*). 

Lemma  49  There  exists  a  positive  constant  c  such  that 

II*  -  *1  >  «>/lP«(x)  -  pa(x*)|  for  X  €  1R”,  X*  e  5*.  (52) 

Theorem  50  Suppose  that  Q  is  symmetric  and  0  <  2a||Q||  <  1.  Let  L*  be 
the  (nonempty)  set  of  local  minimizers  of  PQ(x).  Then  there  exist  positive 
constants  6,tj  such  that,  for  x  £  ]Rn  with  dist (x,  L*)  <  S, 

MW  -  <  dist(x,  L *)  <  fiyjpa(x)  -  Pa(nL.(x)).  (53) 
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Theorem  51  Suppose  that  Q  is  symmetric  and  0  <  2a||Q||  <  1.  Let  G*  be 
the  (nonempty)  set  of  global  minimizers  of  Pa{x).  Then  there  exist  positive 
constants  6,rj  such  that,  for  x  6  lRn  with  dist(x,(j*)  <  6, 

^ a, min  ^  dist(x,(?  )  ^  1)^ Pa(yX)  —  Pajomi  (54) 

where  Pa< ^  =  inf{Pa(x)  :  x  €  lRn}.  . 

5.7  Global  Error  Bounds 

In  general,  (53)  and  (54)  do  not  hold  if  x  is  an  arbitrary  vector  in  1R”.  In 
this  subsection,  we  give  characterizations  for  the  existence  of  global  error 
bounds  for  dist(x, G*).  Note  that  we  can  not  use  >/Pa(x)  —  Pa(Il£,»(x))  as 
a  global  measure  of  dist(x,P*),  since  it  is  likely  to  have  some  x  €  ]R"  \  L* 
such  that  Pa(x )  =  Pa(n.L*(x)). 

Theorem  52  Suppose  that  Q  is  symmetric  and  0  <  2a||Q||  <  1.  Then  the 
following  statements  are  equivalent. 

1°.  Pa(x)  is  bounded  below  for  every  q  6  ]Rn. 

2°.  Pa(x)  has  bounded  level  sets  for  every  q  G  !Rn. 

3°.  Q  is  strictly  copositive. 

4°.  For  any  q,  Pa> min  =  inf{Pa(x)  :  x  €  ]Rn}  >  — oo  and  there  exists  a 
positive  constant  7  =  7 (Q,q)  such  that 

~^Pat(*r)  P 2, min  5;  dist(x,(j  )  <  \J Pa(^X )  —  Pa, min  for  ®  €  IR”. 

Note  that  G*  C  L*  C  S*.  In  general,  G*  ±  L *,  L*  #  5*,  and  G*  ±  S*. 
However,  if  Q  is  positive  semidefinite,  then  Pa(x)  is  convex  for  0  <  a||Q||  <  1 
(cf.  Theorem  36)  and  G*  =  L*  =  S*.  In  this  case,  we  have  the  follow¬ 
ing  global  error  bound  for  symmetric  and  monotone  linear  complementarity 
problems  (cf.  Corollary  2.8  in  [14]). 

Theorem  53  Suppose  that  Q  is  symmetric  and  positive  semidefinite,  0  < 
a | |Q  ||  <  1,  and  the  solution  set  S*  of  (44)  is  not  empty.  Then  there  exists 
a  positive  constant  7  such  that 

dist(x,  S*)  <  7 (( PQ(x )  -  Patmin)  +  yfK(x)  -  Pc, min)  for  X  G  Rn. 
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5.8  A  Newton-Type  Method 

By  exploiting  the  structure  of  VPa(x),  we  can  get  an  iterative  algorithm  for 
finding  a  solution  of  (44)  in  finitely  many  iterations. 

Algorithm  54  Suppose  that  Q  is  symmetric  and  the  principle  submatrices 
of  Q  are  nonsingular  (i.e.,  Q  is  nondegenerate).  Then  generate  a  sequence 
of  iterates  as  follows: 

Step  0.  choose  positive  constants  £  >  0,  0</?<7<l,  0<  2a||Q||  <  1; 
set  b  =  -aq  and  E  =  (/  -  aQ);  let  k  =  0  and  x°  €  Htn; 

Step  1.  compute  rk  =  xk  —  ( Exk  +  6)+; 

Step  2.  if  rk  =  0,  then  output  the  solution  xk  and  stop; 

Step  3.  let  Jk  =  {i :  ( Exk  +  b)i  >  0}  and  Jk  =  {i :  ( Exk  +  b ),•  <  0}; 

Step  4.  let  ukJk  =  Qj^  (Qjkjkr^  -  ±rkJ  and  u\  =  -rkJk; 

Step  5.  if  xk  +  uk  =  {E(xk  +  uk )  +  b)+,  then  output  the  solution  xk  and 
stop; 

Step  6.  if  |(rfc)Tu*  >  £||r*||2,  zk  =  uk;  else,  zk  =  -rk; 

Step  7.  let  /3  <  r)k  <  i  and  find  the  stepsize  tk  ^  0  such  that 

-  sign (tk)(zk)TVPa(xk  +  tkzk)  =  Vk  |(z*)TVPa(xfc)|  (55) 

and \  for  t  between  0  and  tk, 

-  sign (tk)(zk)TVPa{xk  +  tzk)  >  i lk  \(zk)TVPa(xk)\ ;  (56) 

Step  8.  set  xk+1  =  xk  +  tkZk; 

Step  9.  update  k  by  k  +  1  and  go  back  to  Step  1. 

Remark.  The  algorithm  is  very  much  like  a  Newton  method  for  solving  the 
piecewise  linear  equation:  x  -  (a;  -  a(Qx  +  q))+  =  0.  Note  that  the  Newton 
direction  uk  for  P^a;)  at  xk  is  the  solution  of  the  following  linear  system: 

i(J  -  aQ)(I  -  DJk(I  -  aQ))u  =  - VPa(xk )  =  -±(J  -  aQ)rk, 
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or  equivalently, 


(/  -DJk(I  -<xQ))u  =  -rk.  (57) 

That  is,  the  Newton  direction  for  Pa(x )  is  the  Newton  direction  for  the 
equation:  x  —  (x  —  a(Qx  +  <jr))+  =  0. 

We  can  rewrite  (57)  as  follows: 

Ui  =  -rk  for  i  g  Jk,  ct(Qu)i  =  —rk  for  i  €  <4.  (58) 

Using  matrix  and  vector  partition,  we  can  write  the  second  set  of  equations 
in  (58)  as 

aQjkjkujk  +  aQjkjkuJk  =  -rkJk .  (59) 

Thus,  by  (58)  and  (59),  we  get  that  uk ■  =  —rjk  and 

A  =  QiU  (« vA  -  r$.)  • 

Therefore,  it  is  possible  to  use  matrix  updating  techniques  for  computing 
the  inverse  of  Qjkjk.  If  Q  is  a  sparse  matrix,  then  (58)  is  also  a  sparse 
linear  system.  If  Qjkjk  is  nonsingular  and  Wjk  contains  a  stationary  point 
of  Pa(x),  then  (xk  +  uk)  is  the  stationary  point  of  the  quadratic  function 
Pa(x)  on  Wjk. 

The  line  search  can  be  done  in  at  most  0(n2)  operations.  Note  that 

n 

h(t )  =  a(zk)TVPa(xk  +  tzk )  =  a0  +  ait  -  Cj(cjt  -  d,)+, 

i=i 

where  a0  =  ( zk)T Exk ,  ai  =  ( zk)T Ezk ,  d,  =  -( Exk  +  6)j,  and  c,-  =  ( Ezk)i . 
Without  loss  of  generality,  we  assume  that  c,  ^  0.  (If  c,  =  0,  delete  c,(c,t  + 
d,)+  from  the  summation.)  Let  y,  =  After  a  sorting  of  {yi,  •  •  ■  ,yn}  and 
relabeling  (with  C?(ralogra)  operations),  we  may  assume 


Vi  <  V2  <  •••  <  Vn- 

If  h(0)  <  0,  we  can  evaluate  h(t/,)  for  >  0  and  find  yT  >  0  such  that 
h(yr)  >  T}kh{ 0)  and  /i(y.)  <  r)kh(0)  for  0  <  yi  <  yT.  If  yT^  >  0,  let 
yr-\  =  2/r-i;  else,  yr-\  =  0.  Then  there  exists  a  unique  tk  6  [yr-i,yr] 
that  can  be  used  as  our  stepsize.  Note  that  tk  is  the  unique  solution  of  the 
following  linear  equation: 

a0  +  ait-  c,(c,t  -  di)  -  ^  c,(c,t  -  d{)  —  rjkh(0). 

ci  >0,Vi  >3/r_i  c,  <0,7/,  <yr-l 
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Similarly,  if  h( 0)  >  0,  then  we  can  find  a  stepsize  i*  <  0  such  that  (55)  and 
(56)  hold. 

We  can  use  any  descent  method  for  minimization  of  Pa(;r)  to  identify 
an  active  index  set  of  a  solution  of  (44).  The  above  algorithm  tries  to  take 
advantage  of  the  Newton  directions  in  minimization  process. 

Now  we  state  the  finite  termination  of  Algorithm  54. 

Theorem  55  Suppose  that  Q  is  symmetric,  the  principle  submatrices  of 
Q  are  nonsingular ,  and  (51)  is  bounded  below.  Then ,  for  any  given  initial 
guess  x°  and  0  <  (5  <  7  <  1,  Algorithm  54  always  finds  a  solution  of  (44) 
in  finitely  many  iterations. 

Remark,  The  assumption  that  principle  submatrices  of  Q  is  nonsingular 
is  not  essential.  It  is  possible  not  to  use  this  assumption  in  the  design  of  an 
iterative  algorithm  that  terminates  in  finite  iterations  (cf.  [18]). 


5.9  Extension 

Consider  the  nonlinear  complementarity  problem  (46).  Assume  that  F(x) 
is  a  conservative  field  or  VF(x)  is  symmetric;  i.e., 


dFi  dFi  .  . 

Then  there  exists  a  potential  function  f(x)  of  F(x)  such  that 

—  Fi ,  for  1  <  i  <  n. 

OXi 

We  use  the  following  notation  to  denote  such  a  potential  function: 


/(*)  =  jf  F(y)  dv  =  HjcFi(y)  dy *■> 


where  the  integration  is  the  line  integral  in  differential  form  and  C  is  any 
smooth  curve  starting  at  0  and  ending  at  x.  Using  the  potential  function 
f(x)  of  F(x),  we  can  define  the  following  merit  function  for  (46): 


Fa(x) 
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Note  that 


VPa(x)  =  i(l-  aVF(xj)  (*-(*-  aF(x) )+)  . 

Therefore,  if  0  <  a||V.F(:r)||  <  1  for  x  6  lRn,  then  the  stationary  points  of 
PQ(x)  are  the  solutions  of  the  symmetric  nonlinear  complementarity  problem 
(46). 

However,  if  VP(x)  is  not  bounded,  then  we  can  not  use  a  fixed  a  for 
all  x.  One  possible  remedy  is  to  adjust  the  values  dynamically.  That  is,  we 
choose  ajt  such  that  0  <  a*||Vi'1(x*)||  <  1  and  generate  x*+1  by  minimizing 
Pak{x )  with  starting  point  xk . 

5.10  Conclusions 

We  have  studied  many  properties  of  the  augmented  Lagrangian  PQ(x )  for 
symmetric  linear  complementarity  problems.  Unlike  merit  functions  Ma(x) 
and  \P(x),  stationary  points  of  PQ(x)  are  always  solutions  of  (44)  if  0  < 
a||Q||  <  1.  Also  the  merit  function  Pa(x)  is  convex  if  the  original  LCP  (44) 
is  monotone.  However,  Pa(x)  is  not  always  bounded  below.  We  have  derived 
characterizations  for  the  existence  of  global  minimizers  and  the  boundedness 
of  level  sets  of  Pa(x).  In  particular,  if  Q  is  strictly  copositive,  then  Pa(x) 
has  bounded  level  sets. 

One  interesting  result  is  the  connection  between  Pa(x)  and  Mangasar- 
ian  and  Solodov’s  implicit  Lagrangian  Ma(x ):  Ma(x )  =  Pi_(x)  —  Pa(x). 

Q 

Based  on  the  convexity  analysis  of  Pa(x),  we  have  recovered  a  result  by 
Peng  [30]  about  strict  convexity  of  Ma(x )  for  large  a  and  strongly  mono¬ 
tone  LCP.  Since  Ma{x)  =  —M i(x),  Ma(x)  is  a  strictly  concave  function  for 

a 

small  a,  if  F(x)  —  Qx  +  q  and  Q  is  positive  definite.  This  sheds  new  light 
on  the  fact  that  Mangasarian  and  Solodov  [27]  reformulate  (46)  as  uncon¬ 
strained  (global)  minimization  of  Ma(x )  for  a  >  1  and  Tseng,  Yamashita, 
and  Fukushima  [35]  reformulate  (46)  as  unconstrained  (global)  maximiza¬ 
tion  of  Ma{x)  for  0  <  a  <  1. 

A  convenient  choice  of  a  is  by  using  the  supremum  norm  of  Q: 

1 

~  - 

3maxX;i<3« 

J=1 


a  = 


(60) 


Then  0  <  2a||Q||  <  1.  Let 


2 


PQ,q(X)  =  {^xTQx  +  ?Tz)  -  f  I \(Qx  +  q)\\2  +  ^ 

Then  it  is  very  easy  to  verify  that  PeQjqiz)  =  PQ,q(x )  for  0  >  0.  That  is, 
the  augmented  Lagrangian  Pq}9(x)  is  invariant  with  respect  to  the  scaling 
of  (44).  Since  the  scaling  of  (44)  does  not  change  the  characteristics  of 
(44),  it  is  natural  to  require  that  merit  functions  for  (46)  keep  the  intrinsic 
characteristics  of  (46)  and  be  invariant  with  respect  to  scaling  of  the  original 
complementarity  problem,  as  pointed  out  by  J.  More  during  his  talk  at  the 
International  Conference  on  Complementarity  Problems. 

Even  though  PQ(x)  has  many  desirable  properties,  it  requires  the  sym¬ 
metry  of  VF(x)  and  linearity  of  F(x).  For  nonlinear  F(x)  with  symmetric 
Jacobian  VF(x),  Pa(x)  can  be  defined  by  using  a  potential  function  of 
F(x)  and  the  stationary  points  of  Pa(x)  are  solutions  of  (46)  if  ||VF(x)||  is 
bounded  and  a  is  small  enough.  Further  study  is  necessary  to  understand 
the  behavior  of  PQ(x )  for  symmetric  nonlinear  complementarity  problems. 

Since  Ma{x)  is  also  a  differentiable  piecewise  quadratic  function,  it  is 
possible  to  use  a  Newton-type  method  for  finding  a  stationary  point  of  Ma(x) 
in  finitely  many  steps.  However,  the  equations  for  Newton  directions  of 
Ma(x )  are  not  as  easy  to  solve  as  those  for  Newton  directions  of  PQ(x). 


(a(Qx  +  q)  -  x'j  + 
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Figure  1:  Actual  Data 


Figure  2:  Surface  Fitting 


AIRPLANE  PHYSICAL  PARAMETERIZATION 
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Fig.  4  -  Airfoil  Section  Definition 
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Fig.  5  -  Dirichlet  Boundary  Conditions 
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Noisy  Signal  Versus  Noiseless  Signal 
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